Merge branch 'bedboss'

databio · Mar 1, 2024 · a81c1f8 · a81c1f8
2 parents f5e4422 + d102f99
commit a81c1f8
Show file tree

Hide file tree

Showing 8 changed files with 131 additions and 17 deletions.
diff --git a/docs/bedboss/tutorials/bedbuncher_tutorial.md b/docs/bedboss/tutorials/bedbuncher_tutorial.md
@@ -1 +1,29 @@
-### 🚧 Tutorial in progress! Stay tuned for updates. We're working hard to bring you valuable content soon!
+### BEDbuncher
+
+Bedbuncher is used to create bedset of bed files in the bedbase database.
+
+### 1) Create bedbase config file
+### 2) Create pep with bed file record identifiers.
+To do so, you need to create a PEP with the following fields: sample_name (where sample_name is record_identifier), or `sample_name` + `record_identifier`
+e.g. sample_table:
+
+| sample_name | record_identifier |
+|----------|----------|
+| sample1 | asdf3215f34 |
+| sample2 | a23452f34tf | 
+
+### 3) Run bedboss bunch
+#### From command line
+```bash
+bedboss bunch \
+  --bedbase-config path/to/bedbase_config.yaml \
+  --bedset-name bedset1 \
+  --pep path/to/pep.yaml \
+  --bedset-pep bedset_pep.yaml \
+  --cache-path CACHE_PATH
+```
+
+### Run bedboss bunch from within Python
+```python
+
+```
diff --git a/docs/bedboss/tutorials/bedindex_tutorial.md b/docs/bedboss/tutorials/bedindex_tutorial.md
@@ -1 +1,21 @@
-### 🚧 Tutorial in progress! Stay tuned for updates. We're working hard to bring you valuable content soon!
+### Indexing to qdrant database
+
+### 1. Create bedbase config file
+### 2. Run bedboss index
+
+#### From command line
+```bash
+bedboss index --bedbase-config path/to/bedbase_config.yaml
+```
+
+After running this comman all files that are in the database and weren't indexed will be indexed to qdrant database.
+
+
+#### From within Python
+```python
+from bedboss.qdrant_index import add_to_qdrant
+
+add_to_qdrant(
+    bedbase_config="path/to/bedbase_config.yaml"
+)
+```
diff --git a/docs/bedboss/tutorials/tutorial_insert.md b/docs/bedboss/tutorials/tutorial_insert.md
@@ -1,21 +1,21 @@
 ## Bedboss insert 
 
-Bedboss insert is intended to run each sample in provided PEP. 
-PEP can be provided as a file or as a registry path of the PEPhub.
+Bedboss insert is designed to process each sample in the provided PEP. 
+The PEP can be provided either as a path to config file or as a registry path of the PEPhub.
 
 
 ### Step 1: Install all dependencies
 
-First you have to install bedboss and check if all requirements are satisfied. 
-To do so, you can run next command:
+First, you have to install bedboss and check if all requirements are satisfied. 
+To do so, you can run the following command:
 ```bash
 bedboss requirements-check
 ```
 If requirements are not satisfied, you will see the list of missing packages.
 
 ### Step 2: Create bedconf.yaml file 
 To run bedboss insert, you need to create a bedconf.yaml file with configuration. 
-Detail instructions are in the configuration section.
+Detailed instructions are in the configuration section.
 
 ### Step 3: Create PEP with bed files.
 BEDboss PEP should contain next fields: sample_name, input_file, input_type, genome.
@@ -33,14 +33,14 @@ bedboss insert \
 
 ```
 
-Above command will run bedboss on the bed file and create a bedstat file in the output directory.
+Above command will run bedboss on the bed file and create a file with statistics in the output directory. 
 It contains only required parameters. For more details, please check the usage section.
 
-By default, results will be uploaded only to postgres database.
-- To upload results to PEPhub, you need to make `databio` org available on GitHub, then login to PEPhub, and add `--upload-pephub` flag to the command.
-- To upload results to Qdrant, you need to add `--upload-qdrant` flag to the command.
-- To upload actual files to s3, you need to add `--upload-s3` flag to the command, and Before uploading you have to set up all necessary env vars: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_ENDPOINT_URL.
-- To create bedset of provided pep files, you need to add `--create-bedset` flag to the command.
+By default, results will be uploaded only to the PostgreSQL database.
+- To upload results to PEPhub, you need to make the `databio` org available on GitHub, then login to PEPhub, and add the `--upload-pephub` flag to the command.
+- To upload results to Qdrant, you need to add the `--upload-qdrant` flag to the command.
+- To upload actual files to S3, you need to add the `--upload-s3` flag to the command, and before uploading, you have to set up all necessary environment variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_ENDPOINT_URL.
+- To create a bedset of provided pep files, you need to add the `--create-bedset` flag to the command.
 
 
 ---

diff --git a/docs/bedhost/README.md b/docs/bedhost/README.md
@@ -1,4 +1,24 @@
-# BEDhost API guide
+<h1 align="center">bedhost</h1>
+
+[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
+[![Github badge](https://img.shields.io/badge/source-github-354a75?logo=github)](https://github.com/databio/bedhost)
+
+
+`bedhost` is a Python FastAPI module for the API that powers BEDbase
+It needs a path to the *bedbase configuration file*, which can be provided either via `-c`/`--config` argument or read from `$BEDBASE_CONFIG` environment variable. 
+
+---
+
+**Deployed public instance**: <a href="https://bedbase.org/" target="_blank">https://bedbase.org/</a>
+
+**Documentation**: <a href="https://docs.bedbase.org/" target="_blank">https://docs.bedbase.org/bedhost</a>
+
+**API**: <a href="https://api.bedbase.org/" target="_blank">https://api.bedbase.org/</a>
+
+**Source Code**: <a href="https://github.com/databio/bedhost/" target="_blank">https://github.com/databio/bedhost/</a>
+
+---
+
 
 ## Introduction
 

diff --git a/docs/bedhost/changelog.md b/docs/bedhost/changelog.md
@@ -2,6 +2,13 @@
 
 This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html) and [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) format. 
 
+## [0.3.0] -- 2023-03-01
+### change
+- switch to pydantic2
+- updated requirements
+- updated docs
+
+
 ## [0.2.0] -- 2023-10-17
 - remove all graphql
 - remove local static hosting of UI

diff --git a/docs/bedhost/dev-guide.md b/docs/bedhost/dev-guide.md
@@ -0,0 +1,35 @@
+# Developer Guide
+
+## Introduction
+
+### Data types
+
+BEDbase stores two types of data, which we call *records*. They are 1. BEDs, and 2. BEDsets. BEDsets are simply collections of BEDs. Each record in the database is either a BED or a BEDset.
+
+### Endpoint organization
+
+The endpoints are divided into 3 groups:
+
+1. `/bed` endpoints are used to interact with metadata for BED records.
+2. `/bedset` endpoints are used to interact with metadata for BEDset records.
+3. `/objects` endpoints are used to download metadata and get URLs to retrieve the underlying data itself. These endpoints implement the [GA4GH DRS standard](https://ga4gh.github.io/data-repository-service-schemas/).
+
+Therefore, to get information and statistics about BED or BEDset records, or what is contained in the database, look through the `/bed` and `/bedset` endpoints. But if you need to write a tool that gets the actual underlying files, then you'll need to use the `/objects` endpoints. The type of identifiers used in each case differ.
+
+## Record identifiers vs. object identifiers
+
+Each record has an identifier. For example, `eaf9ee97241f300f1c7e76e1f945141f` is a BED identifier. You can use this identifier for the metadata endpoints. To download files, you'll need something slightly different -- you need an *object identifier*. This is because each BED record includes multiple files, such as the original BED file, the BigBed file, analysis plots, and so on. To download a file, you will construct what we call the `object_id`, which identifies the specific file.
+
+## How to construct object identifiers
+
+Object IDs take the form `<record_type>.<record_identifier>.<result_id>`. An example of an object_id for a BED file is `bed.eaf9ee97241f300f1c7e76e1f945141f.bedfile`
+
+So, you can get information about this object like this:
+
+`GET` [/objects/bed.eaf9ee97241f300f1c7e76e1f945141f.bedfile](/objects/bed.eaf9ee97241f300f1c7e76e1f945141f.bedfile)
+
+Or, you can get a URL to download the actual file with:
+
+`GET` [/objects/bed.eaf9ee97241f300f1c7e76e1f945141f.bedfile/access/http](/objects/bed.eaf9ee97241f300f1c7e76e1f945141f.bedfile/access/http)
+
+
diff --git a/docs/geniml/README.md b/docs/geniml/README.md
@@ -1,7 +1,10 @@
-# <img src="img/geniml_logo_horizontal.svg" class="img-header">
+<p align="center">
+<img align="center" src="img/geniml_logo_horizontal.svg" class="img-header" height="100">
+</p>
+
 
 <p align="center">
-<a href="https://img.shields.io/pypi/v/geniml"><img src="https://img.shields.io/pypi/v/geniml"></a>
+<a href="https://img.shields.io/pypi/v/geniml"><img src="https://img.shields.io/pypi/v/geniml" alt=""></a>
 <a href="https://github.com/databio/geniml"><img src="https://img.shields.io/badge/source-github-354a75?logo=github"></a>
 </p>
 

diff --git a/mkdocs.yml b/mkdocs.yml
@@ -47,6 +47,7 @@ nav:
     - API guides:
       - BEDhost API guide:
         - BEDhost: bedhost/README.md
+        - Developer Guide: bedhost/dev-guide.md
         - Changelog: bedhost/changelog.md
       - BBConf:
         - BBConf: bbconf/README.md
@@ -63,8 +64,8 @@ nav:
   - BEDboss:
     - BEDBoss: bedboss/README.md
     - Tutorial:
-      - BEDboss-all pipeline: bedboss/tutorials/tutorial_all.md
       - BEDboss insert: bedboss/tutorials/tutorial_insert.md
+      - BEDboss-all pipeline: bedboss/tutorials/tutorial_all.md
       - BEDmaker tutorial: bedboss/tutorials/bedmaker_tutorial.md
       - BEDqc tutorial: bedboss/tutorials/bedqc_tutorial.md
       - BEDstat tutorial: bedboss/tutorials/bedstat_tutorial.md