Bioframe: Operations on Genomic Interval Dataframes

Bioframe enables flexible and scalable operations on genomic interval dataframes in Python.

Bioframe is built directly on top of Pandas. Bioframe provides:

A variety of genomic interval operations that work directly on dataframes.
Operations for special classes of genomic intervals, including chromosome arms and fixed-size bins.
Conveniences for diverse tabular genomic data formats and loading genome assembly summary information.

Read the docs, including the guide, as well as the bioframe preprint for more information.

Bioframe is an Affiliated Project of NumFOCUS.

Installation

Bioframe is available on PyPI and bioconda:

pip install bioframe

Contributing

Interested in contributing to bioframe? That's great! To get started, check out the contributing guide. Discussions about the project roadmap take place on the Open2C Slack and regular developer meetings scheduled there. Anyone can join and participate!

Interval operations

Key genomic interval operations in bioframe include:

overlap: Find pairs of overlapping genomic intervals between two dataframes.
closest: For every interval in a dataframe, find the closest intervals in a second dataframe.
cluster: Group overlapping intervals in a dataframe into clusters.
complement: Find genomic intervals that are not covered by any interval from a dataframe.

Bioframe additionally has functions that are frequently used for genomic interval operations and can be expressed as combinations of these core operations and dataframe operations, including: coverage, expand, merge, select, and subtract.

To overlap two dataframes, call:

import bioframe as bf

bf.overlap(df1, df2)

For these two input dataframes, with intervals all on the same chromosome:

overlap will return the following interval pairs as overlaps:

To merge all overlapping intervals in a dataframe, call:

import bioframe as bf

bf.merge(df1)

For this input dataframe, with intervals all on the same chromosome:

merge will return a new dataframe with these merged intervals:

See the guide for visualizations of other interval operations in bioframe.

File I/O

Bioframe includes utilities for reading genomic file formats into dataframes and vice versa. One handy function is read_table which mirrors pandas’s read_csv/read_table but provides a schema argument to populate column names for common tabular file formats.

jaspar_url = 'http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2022/hg38/MA0139.1.tsv.gz'
ctcf_motif_calls = bioframe.read_table(jaspar_url, schema='jaspar', skiprows=1)

Tutorials

See this jupyter notebook for an example of how to assign TF motifs to ChIP-seq peaks using bioframe.

Citing

If you use bioframe in your work, please cite:

@article{bioframe_2024,
author = {Open2C and Abdennur, Nezar and Fudenberg, Geoffrey and Flyamer, Ilya M and Galitsyna, Aleksandra A and Goloborodko, Anton and Imakaev, Maxim and Venev, Sergey},
doi = {10.1093/bioinformatics/btae088},
journal = {Bioinformatics},
title = {{Bioframe: Operations on Genomic Intervals in Pandas Dataframes}},
year = {2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 624 Commits
.github/workflows		.github/workflows
bioframe		bioframe
docs		docs
tests		tests
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
CHANGES.md		CHANGES.md
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bioframe: Operations on Genomic Interval Dataframes

Installation

Contributing

Interval operations

File I/O

Tutorials

Citing

About

Releases

Packages

Languages

License

harshit148/bioframe

Folders and files

Latest commit

History

Repository files navigation

Bioframe: Operations on Genomic Interval Dataframes

Installation

Contributing

Interval operations

File I/O

Tutorials

Citing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages