Skip to content

Commit

Permalink
README updated
Browse files Browse the repository at this point in the history
  • Loading branch information
saanikat committed Oct 7, 2024
1 parent e6ace12 commit 732b505
Showing 1 changed file with 14 additions and 19 deletions.
33 changes: 14 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,15 @@ pip install git+https://github.com/databio/bedms.git
## Usage

### Standardizing based on available schemas

To choose the schema you want to standardize according to, please refer to the [HuggingFace repository](https://huggingface.co/databio/attribute-standardizer-model6). Based on the schema design `.yaml` files, you can select which schema best represents your attributes. In the example below, we have chosen `encode` schema.

```python
from bedms import AttrStandardizer

model = AttrStandardizer("ENCODE")
model = AttrStandardizer(
repo_id="databio/attribute-standardizer-model6", model_name="encode"
)
results = model.standardize(pep="geo/gse228634:default")

assert results
Expand All @@ -33,9 +38,9 @@ Training your custom schema is very easy with `BEDMS`. You would need two things
To instantiate `TrainStandardizer` class:

```python
from bedms.train import TrainStandardizer
from bedms.train import AttrStandardizerTrainer

trainer = TrainStandardizer("training_config.yaml")
trainer = AttrStandardizerTrainer("training_config.yaml")

```
To load the datasets and encode them:
Expand Down Expand Up @@ -63,26 +68,16 @@ trainer.plot_visualizations()
```

### Standardizing based on custom schema
For standardizing based on custom schema, you would require a `custom_config.yaml`.

For standardizing based on custom schema, your model should be on HuggingFace. The directory structure should follow the instructions mentioned on [HuggingFace](https://huggingface.co/databio/attribute-standardizer-model6).

```python
from bedms import AttrStandardizer

model = AttrStandardizer("CUSTOM", "custom_config.yaml")

model = AttrStandardizer(
repo_id="name/of/your/hf/repo", model_name="model/name"
)
results = model.standardize(pep="geo/gse228634:default")

assert results
```

### Available schemas
To see the available schemas, you can run:
```
from bedms.const import AVAILABLE_SCHEMAS
print(AVAILABLE_SCHEMAS)
# >> ['ENCODE', 'FAIRTRACKS', 'BEDBASE']
```

AVAILABLE_SCHEMAS is a list of available schemas that you can use to standardize your metadata.
```

0 comments on commit 732b505

Please sign in to comment.