Skip to content

Adding b37 genome

Peter Kerpedjiev edited this page Jan 17, 2017 · 2 revisions

Adding a genome (using b37 as an example)

From the repository's root directory:

mkdir negspy/data/b37

We need to create two new files:

negspy/data/b37/chromInfo.txt
negspy/data/b37/chromOrder.txt

The b37 genome is a little difficult because it's not on UCSC's list of genomes. To get the information about chromosome sizes, one needs to download the reference genome bundle from the Broad Institute, extract the file and look in the bwa1/Homo_sapiens_assembly19.fasta.ann file. From this file, one can extract the chromosome names and their sizes. These need to be placed in a file in the previously created directory (negspy/data/b37/chromInfo.txt):

1   249250621
2   243199373
3   198022430
4   191154276
...
Y   59373566
19  59128983
22  51304566
21  48129895
MT  16569

The ordering in chromInfo.txt is unimportant as it will be specified in negspy/data/b37/chromOrder.txt:

1
2
3
...
22
X
Y
MT

Now we can add a test to test/coordinates_test.py and make sure it works. Run the test using:

nosetest test/

Finally, we can commit our changes, bump the version and upload to pypi:

bumpversion minor
python setup.py sdist upload -r pypi
Clone this wiki locally