This repository contains the official python implementation of the following paper: Sarwal, Varuni, Jaqueline Brito, Serghei Mangul, and David J. Koslicki. "TAMPA: interpretable analysis and visualization of metagenomics-based taxon abundance profiles." bioRxiv (2022).
(https://www.biorxiv.org/content/10.1101/2022.04.28.489926v1.abstract)
git clone https://github.com/dkoslicki/TAMPA.git
cd TAMPA
Please follow the instructions at the following link to set up anaconda: Anaconda Setup
The following commands create a conda environment inside the repository with the dependencies.
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda create -c etetoolkit -y -n CAMIViz python=3.7 numpy ete3 seaborn pandas matplotlib biom-format
conda activate CAMIViz
Waiting for pull request to get merged
python src/tampa.py -i data/mad_yalow_0.profile.txt -g data/gs_marine_short.profile.txt class -s marmgCAMI2_short_read_sample_0 -b marine_test -k linear -r 1600 -c False -o .
This should result in a plot that looks like:
TAMPA provides a "CONTRAST MODE" to better visualize the differences between the tool and gold standard. The contrast mode can be activated by setting the parameter c to True as follows
python src/tampa.py -i data/mad_yalow_0.profile.txt -g data/gs_marine_short.profile.txt class -s marmgCAMI2_short_read_sample_0 -b marine_test -k linear -r 1600 -c True -o .
This should result in a plot that looks like:
A comprehensive list of visualization options can be obtained using
python src/tampa.py
The options are as follows:
usage: tampa.py [-h] [-i INPUT_PROFILE] [-i1 INPUT_PROFILE1]
[-g GROUND_TRUTH_INPUT_PROFILE] [-b OUTPUT_BASE_NAME]
[-t FILE_TYPE] [-s SAMPLE_OF_INTEREST] [-k SCALING]
[-a LABELS] [-y LAYOUT] [-l] [-n] [-m] [-d DB_FILE] [-r RES]
[-p] [-top TOP] [-thr THR] [-fs FONTSIZE] [-ls LABELSIZE]
[-lw LABELWIDTH] [-bm BRANCHMARGIN] [-lsep LEAF_SEP]
[-fh FIGHEIGHT] [-fw FIGWIDTH] [-nm] [-o OUTPUT_PATH]
[-dt HIGHLIGHT_DIFFERENCES_THRESHOLD] [-c CONTRAST]
[-fir INPUT1] [-sec INPUT2]
taxonomic_rank
Plot abundance of profile against ground truth on taxonomic tree.
positional arguments:
taxonomic_rank Taxonomic rank to do the plotting at
optional arguments:
-h, --help show this help message and exit
-i INPUT_PROFILE, --input_profile INPUT_PROFILE
Input taxonomic profile
-i1 INPUT_PROFILE1, --input_profile1 INPUT_PROFILE1
Second (optional) input taxonomic profile1
-g GROUND_TRUTH_INPUT_PROFILE, --ground_truth_input_profile GROUND_TRUTH_INPUT_PROFILE
Input ground truth taxonoomic profile
-b OUTPUT_BASE_NAME, --output_base_name OUTPUT_BASE_NAME
Base name for output
-t FILE_TYPE, --file_type FILE_TYPE
File type for output images (svg, png, pdf, etc.
-s SAMPLE_OF_INTEREST, --sample_of_interest SAMPLE_OF_INTEREST
If you're only interested in a single sample of
interest, specify here.
-k SCALING, --scaling SCALING
Plot scaling (log, sqrt, power etc.
-a LABELS, --labels LABELS
Specify this otion if you want to add labels to the
graph (All, Leaf, None)
-y LAYOUT, --layout LAYOUT
Chose the layout of the graph (Pie, Bar, Circle,
Rectangle
-l, --plot_l1 If you also want to plot the L1 error
-n, --normalize specify this option if you want to normalize the node
weights/relative abundances so that they sum to one
-m, --merge specify this option if you to average over all the
@SampleID's and plot a single tree
-d DB_FILE, --db_file DB_FILE
specify database dump file
-r RES, --res RES specify the resolution (dpi)
-p, --profile specify this option to use only the input profile(s)
taxID's to construct the tree
-top TOP, --top TOP specify this option to display only the top nodes with
highest abundance
-thr THR, --thr THR specify this option to display only the nodes with
abundance higher than threshold
-fs FONTSIZE, --fontsize FONTSIZE
specify this option to change the font size of the
labels
-ls LABELSIZE, --labelsize LABELSIZE
specify this option to display only the nodes with
abundance higher than threshold
-lw LABELWIDTH, --labelwidth LABELWIDTH
specify this option to display only the nodes with
abundance higher than threshold
-bm BRANCHMARGIN, --branchmargin BRANCHMARGIN
specify this option to change the branch vertical
margin
-lsep LEAF_SEP, --leaf_sep LEAF_SEP
specify this option to change the leaf separation
-fh FIGHEIGHT, --figheight FIGHEIGHT
specify this option to change the figure height (in)
-fw FIGWIDTH, --figwidth FIGWIDTH
specify this option to change the figure width (in)
-nm, --no_monitor If you are running on a server or other monitor-less
environment, use this flag to save directly to a file
-o OUTPUT_PATH, --output_path OUTPUT_PATH
Output path
-dt HIGHLIGHT_DIFFERENCES_THRESHOLD, --highlight_differences_threshold HIGHLIGHT_DIFFERENCES_THRESHOLD
If at any rank the two input samples have a difference
in abundance greater than or equal to N percent, this
taxa will be highlighted
-c CONTRAST, --contrast CONTRAST
contrast mode for comparison with gold standard
-fir INPUT1, --input1 INPUT1
Name of the first input
-sec INPUT2, --input2 INPUT2
Name of the second input
- -thr: This option can be used to display only the nodes with abundance higher than a particular threshold
- -c: Contrast mode can be used to identify problematic subregions of large trees. In the contrast mode, the false positive taxas are represented as red circles, the false negative taxas as blue circles, true positives as white, and the remaining taxas in a gradient of white to green, with the color intensity proportional to the relative error.
- -top: This option can be used to display only the top nodes with highest abundance