Skip to content

Network-based module discovery algorithm with high rate of empirically-validated term calls

License

Notifications You must be signed in to change notification settings

Shamir-Lab/DOMINO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DOMINO

DOMINO: Discovery of Modules In Networks using Omic.

DOMINO is an active module identification (AMI) algorithm. It recieves a gene network and nodes' activity scores as input and report sub-networks (modules) that are putatively biologically meaningful in the context of the activity data.

In extensive evaluation conducted on gene expression and genome-wide association study data we discovered that AMI algorithms tended to over-reporting of enrichment: GO terms enriched in the modules on real data were often also enriched when the algorithms were run on randomly permuted activity scores.

In constrast, modules retrieved by DOMINO had high rate of empirically validated GO terms.

The study is available at https://www.embopress.org/doi/full/10.15252/msb.20209593.

Requirements

DOMINO was tested under the following settings:

  • Python 3.8 (Note that for further versions of python some dependency packages are currently not available via pip)
  • Linux OS (Ubuntu 14.04 LTS, Ubuntu 18.04.4 LTS)

Installation

From pip

We recommend using a virtual environment. For example:

python3 -m venv domino-env
source domino-env/bin/activate

Then, install domino via pip:

pip install domino-python

From conda (Bioconda)

Make sure the Bioconda repository and its dependencies are available:

conda config --add channels defaults
conda config --add channels conda-forge 
conda config --add channels bioconda

Create a virtual environment in conda. For example:

conda create --name domino-env
conda activate domino-env

Then, install domino via pip:

conda install domino

From source

Download the source files and install according to the following:

Clone the repo from Github:

git clone https://github.com/Shamir-Lab/DOMINO.git
cd DOMINO

DOMINO is written in Python3. The necessary libraries will all be installed by the setup.py script. We recommend using a virtual environment. For example:

python3 -m venv domino-env
source domino-env/bin/activate

Then, run setup.py:

python setup.py install

Input File Formats

  • A network file should be in a simplified sif format:

    • Only single node should appear in the first and last column.
    • The First row is headers
  • An active gene file contains the gene ids in Ensemble format, separated by a newline char

  • The slices file format is automatically generated by the slicer command.

For examples, see files in "examples" folder

Basic Usage

To run preprocessing step 0 (partitioning network using Louvain algorithm):

slicer --network_file </path/to/network.sif> --output_file </path/to/output_file>

-n/--network_file: A path to network file (sif format). e.g., /path/to/network_file.sif.

-o/--output_file: A path to the output slices file. e.g., /path/to/output/slices_file.txt,

To run DOMINO:

domino --active_genes_files </path/to/dataset1,/path/to/dataset2...> --network_file </path/to/network.sif> --slices_file <slices_file.txt> --output_folder </path/to/output_folder> [-sth <slices_threshold> -mth <putative_modules_threshold>]

The common command line options are:

-a/--active_genes_files: Comma delimited list of absolute paths to files, each containing a list of active genes, separated by a new line char (\n). e.g. /path/to/active_genes_files_1,/path/to/active_genes_files_2.

-n/--network_file: A path to network file (sif format). e.g., /path/to/network_file.sif.

-s/--slices_file: A path to slices file (i.e. the output of "slicer" script). e.g., /path/to/slices_file.txt,

Advanced usage

-c/--use_cache: Use auto-generated cache network files (*.pkl) from previous executions with the same network. NOTE: (1) THIS IS NOT THE SLICES FILE! (2) If the content of the file has changed, you should set this option to "false"

-p/--parallelization: The number of threads allocated to the run (usually single thread is enough)

-v/--visualization: Indicates whether a visualization of the modules ought to be generated

-sth/--slices_threshold: The threshold for considering a slice as relevant

-mth/--module_threshold: The threshold for considering a putative module as final module.

Main output files

output_folder/active_gene_file_name/modules.out: list of final modules output_folder/active_gene_file_name/module_i.html: visualization of the i'th module

Example files

Example files of networks in simplified sif format and an active gene file are available under "examples" folder

About

Network-based module discovery algorithm with high rate of empirically-validated term calls

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages