Skip to content

Commit

Permalink
Merge branch 'bedboss'
Browse files Browse the repository at this point in the history
  • Loading branch information
khoroshevskyi committed Mar 1, 2024
2 parents f5e4422 + d102f99 commit a81c1f8
Show file tree
Hide file tree
Showing 8 changed files with 131 additions and 17 deletions.
30 changes: 29 additions & 1 deletion docs/bedboss/tutorials/bedbuncher_tutorial.md
Original file line number Diff line number Diff line change
@@ -1 +1,29 @@
### 🚧 Tutorial in progress! Stay tuned for updates. We're working hard to bring you valuable content soon!
### BEDbuncher

Bedbuncher is used to create bedset of bed files in the bedbase database.

### 1) Create bedbase config file
### 2) Create pep with bed file record identifiers.
To do so, you need to create a PEP with the following fields: sample_name (where sample_name is record_identifier), or `sample_name` + `record_identifier`
e.g. sample_table:

| sample_name | record_identifier |
|----------|----------|
| sample1 | asdf3215f34 |
| sample2 | a23452f34tf |

### 3) Run bedboss bunch
#### From command line
```bash
bedboss bunch \
--bedbase-config path/to/bedbase_config.yaml \
--bedset-name bedset1 \
--pep path/to/pep.yaml \
--bedset-pep bedset_pep.yaml \
--cache-path CACHE_PATH
```

### Run bedboss bunch from within Python
```python

```
22 changes: 21 additions & 1 deletion docs/bedboss/tutorials/bedindex_tutorial.md
Original file line number Diff line number Diff line change
@@ -1 +1,21 @@
### 🚧 Tutorial in progress! Stay tuned for updates. We're working hard to bring you valuable content soon!
### Indexing to qdrant database

### 1. Create bedbase config file
### 2. Run bedboss index

#### From command line
```bash
bedboss index --bedbase-config path/to/bedbase_config.yaml
```

After running this comman all files that are in the database and weren't indexed will be indexed to qdrant database.


#### From within Python
```python
from bedboss.qdrant_index import add_to_qdrant

add_to_qdrant(
bedbase_config="path/to/bedbase_config.yaml"
)
```
22 changes: 11 additions & 11 deletions docs/bedboss/tutorials/tutorial_insert.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,21 @@
## Bedboss insert

Bedboss insert is intended to run each sample in provided PEP.
PEP can be provided as a file or as a registry path of the PEPhub.
Bedboss insert is designed to process each sample in the provided PEP.
The PEP can be provided either as a path to config file or as a registry path of the PEPhub.


### Step 1: Install all dependencies

First you have to install bedboss and check if all requirements are satisfied.
To do so, you can run next command:
First, you have to install bedboss and check if all requirements are satisfied.
To do so, you can run the following command:
```bash
bedboss requirements-check
```
If requirements are not satisfied, you will see the list of missing packages.

### Step 2: Create bedconf.yaml file
To run bedboss insert, you need to create a bedconf.yaml file with configuration.
Detail instructions are in the configuration section.
Detailed instructions are in the configuration section.

### Step 3: Create PEP with bed files.
BEDboss PEP should contain next fields: sample_name, input_file, input_type, genome.
Expand All @@ -33,14 +33,14 @@ bedboss insert \

```

Above command will run bedboss on the bed file and create a bedstat file in the output directory.
Above command will run bedboss on the bed file and create a file with statistics in the output directory.
It contains only required parameters. For more details, please check the usage section.

By default, results will be uploaded only to postgres database.
- To upload results to PEPhub, you need to make `databio` org available on GitHub, then login to PEPhub, and add `--upload-pephub` flag to the command.
- To upload results to Qdrant, you need to add `--upload-qdrant` flag to the command.
- To upload actual files to s3, you need to add `--upload-s3` flag to the command, and Before uploading you have to set up all necessary env vars: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_ENDPOINT_URL.
- To create bedset of provided pep files, you need to add `--create-bedset` flag to the command.
By default, results will be uploaded only to the PostgreSQL database.
- To upload results to PEPhub, you need to make the `databio` org available on GitHub, then login to PEPhub, and add the `--upload-pephub` flag to the command.
- To upload results to Qdrant, you need to add the `--upload-qdrant` flag to the command.
- To upload actual files to S3, you need to add the `--upload-s3` flag to the command, and before uploading, you have to set up all necessary environment variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_ENDPOINT_URL.
- To create a bedset of provided pep files, you need to add the `--create-bedset` flag to the command.


---
Expand Down
22 changes: 21 additions & 1 deletion docs/bedhost/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,24 @@
# BEDhost API guide
<h1 align="center">bedhost</h1>

[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Github badge](https://img.shields.io/badge/source-github-354a75?logo=github)](https://github.com/databio/bedhost)


`bedhost` is a Python FastAPI module for the API that powers BEDbase
It needs a path to the *bedbase configuration file*, which can be provided either via `-c`/`--config` argument or read from `$BEDBASE_CONFIG` environment variable.

---

**Deployed public instance**: <a href="https://bedbase.org/" target="_blank">https://bedbase.org/</a>

**Documentation**: <a href="https://docs.bedbase.org/" target="_blank">https://docs.bedbase.org/bedhost</a>

**API**: <a href="https://api.bedbase.org/" target="_blank">https://api.bedbase.org/</a>

**Source Code**: <a href="https://github.com/databio/bedhost/" target="_blank">https://github.com/databio/bedhost/</a>

---


## Introduction

Expand Down
7 changes: 7 additions & 0 deletions docs/bedhost/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,13 @@

This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html) and [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) format.

## [0.3.0] -- 2023-03-01
### change
- switch to pydantic2
- updated requirements
- updated docs


## [0.2.0] -- 2023-10-17
- remove all graphql
- remove local static hosting of UI
Expand Down
35 changes: 35 additions & 0 deletions docs/bedhost/dev-guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Developer Guide

## Introduction

### Data types

BEDbase stores two types of data, which we call *records*. They are 1. BEDs, and 2. BEDsets. BEDsets are simply collections of BEDs. Each record in the database is either a BED or a BEDset.

### Endpoint organization

The endpoints are divided into 3 groups:

1. `/bed` endpoints are used to interact with metadata for BED records.
2. `/bedset` endpoints are used to interact with metadata for BEDset records.
3. `/objects` endpoints are used to download metadata and get URLs to retrieve the underlying data itself. These endpoints implement the [GA4GH DRS standard](https://ga4gh.github.io/data-repository-service-schemas/).

Therefore, to get information and statistics about BED or BEDset records, or what is contained in the database, look through the `/bed` and `/bedset` endpoints. But if you need to write a tool that gets the actual underlying files, then you'll need to use the `/objects` endpoints. The type of identifiers used in each case differ.

## Record identifiers vs. object identifiers

Each record has an identifier. For example, `eaf9ee97241f300f1c7e76e1f945141f` is a BED identifier. You can use this identifier for the metadata endpoints. To download files, you'll need something slightly different -- you need an *object identifier*. This is because each BED record includes multiple files, such as the original BED file, the BigBed file, analysis plots, and so on. To download a file, you will construct what we call the `object_id`, which identifies the specific file.

## How to construct object identifiers

Object IDs take the form `<record_type>.<record_identifier>.<result_id>`. An example of an object_id for a BED file is `bed.eaf9ee97241f300f1c7e76e1f945141f.bedfile`

So, you can get information about this object like this:

`GET` [/objects/bed.eaf9ee97241f300f1c7e76e1f945141f.bedfile](/objects/bed.eaf9ee97241f300f1c7e76e1f945141f.bedfile)

Or, you can get a URL to download the actual file with:

`GET` [/objects/bed.eaf9ee97241f300f1c7e76e1f945141f.bedfile/access/http](/objects/bed.eaf9ee97241f300f1c7e76e1f945141f.bedfile/access/http)


7 changes: 5 additions & 2 deletions docs/geniml/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
# <img src="img/geniml_logo_horizontal.svg" class="img-header">
<p align="center">
<img align="center" src="img/geniml_logo_horizontal.svg" class="img-header" height="100">
</p>


<p align="center">
<a href="https://img.shields.io/pypi/v/geniml"><img src="https://img.shields.io/pypi/v/geniml"></a>
<a href="https://img.shields.io/pypi/v/geniml"><img src="https://img.shields.io/pypi/v/geniml" alt=""></a>
<a href="https://github.com/databio/geniml"><img src="https://img.shields.io/badge/source-github-354a75?logo=github"></a>
</p>

Expand Down
3 changes: 2 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ nav:
- API guides:
- BEDhost API guide:
- BEDhost: bedhost/README.md
- Developer Guide: bedhost/dev-guide.md
- Changelog: bedhost/changelog.md
- BBConf:
- BBConf: bbconf/README.md
Expand All @@ -63,8 +64,8 @@ nav:
- BEDboss:
- BEDBoss: bedboss/README.md
- Tutorial:
- BEDboss-all pipeline: bedboss/tutorials/tutorial_all.md
- BEDboss insert: bedboss/tutorials/tutorial_insert.md
- BEDboss-all pipeline: bedboss/tutorials/tutorial_all.md
- BEDmaker tutorial: bedboss/tutorials/bedmaker_tutorial.md
- BEDqc tutorial: bedboss/tutorials/bedqc_tutorial.md
- BEDstat tutorial: bedboss/tutorials/bedstat_tutorial.md
Expand Down

0 comments on commit a81c1f8

Please sign in to comment.