Skip to content

Code used to derive a regulatory Vocabulary of DHSs using NMF

Notifications You must be signed in to change notification settings

Altius/DHSVocabulary

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DHSVocabulary

Framework for running and analyzing ENCODE DHS presence/absence data with non-negative matrix factorization (NMF). This OONMF.py library provides a wrapper to run the scikit-learn NMF routine, and several other functions that can analyze the resulting decomposed matrices. OONMFhelpers.py and OONMFmetadata.py are libraries that provide additional functions useful for analysis.

To run the code as we did in the DHS Vocabulary paper with k=16, download the repository, as well as the DHS presence/absence matrix (733 samples x 3.59e6 DHSs) dat_bin_FDR01_hg38.txt.gz (available here) and run:

python OONMF_compute_presence_NNDSVD_O.py

the output of this code will be the Basis (733 samples x 16 components) and Mixture (3.59e6 DHSs x 16 components) matrices, in numpy binary format. If tab-separated output format is desired, uncomment the last line of this script.

Note: requires Python 3 with scikit-learn, numpy, scipy, matplotlib, pandas installations.

About

Code used to derive a regulatory Vocabulary of DHSs using NMF

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages