Explain the resulting profile

The result that you obtain from motus profile or motus calc_motu is a profile with three headers that start with #. After these three lines you have the taxa id/name and read count values.

Example of a motus profile:

# git tag version 2.0.0 |  motus version 2.0.0 | map_tax 2.0.0 | gene database: nr2.0.0 | calc_mgc 2.0.0 -y insert.scaled_counts -l 75 | calc_motu 2.0.0 -k mOTU -g 3 | taxonomy: ref_mOTU_2.0.0 meta_mOTU_2.0.0
# call: python mOTUs_v2/motus profile -s test1_single.fastq -n test1
#consensus_taxonomy	test1
Kandleria vitulina [ref_mOTU_v2_0001]	0.0688211617
Methyloversatilis universalis [ref_mOTU_v2_0002]	0.0000000000
Megasphaera genomosp. [ref_mOTU_v2_0003]	0.0234955832
...
Thermoproteus uzoniensis [ref_mOTU_v2_5304]	0.0000000000
Paenibacillus sp. [ref_mOTU_v2_5305]	0.0030541740
unknown Bdellovibrio [meta_mOTU_v2_5307]	0.0000000000
unknown Alphaproteobacteria [meta_mOTU_v2_5308]	0.0000031719
...
unknown Clostridiales [meta_mOTU_v2_7800]	0.0000000000
unassigned	0.2307163722

You can easily remove the first two rows with:

tail -n+3 taxonomic_profiling.txt

Let's analyse the single parts:

Header 1

The first header describes the version of the scripts and database that were used for profiling, as well as the parameters used for the computation. With this information is possible to reproduce the same profiles, and it's useful to check the parameters used for this specific profile.

Header 2

The second header contains the call that produced the profile, with the information of the fastq files that were used.

Header 3

Contains the information of what the rows represents and the name of the sample(s).

Ref-mOTUs

There are 5,232 ref-mOTUs, which represents species with a reference genome in NCBI (or other databases). The name of the ref-mOTUs is resolved at the species level. Note that the mOTUs represents species based on genetic distances, which in some cases are different from the historical phenotype-based classification of prokaryotic species (check Mende et al. Nature methods 2013 and Parks et al. Nature Biotech 2018 for more information).

Meta-mOTUs

There are 2,494 meta-mOTUs, which represents species without a reference genome. These mOTUs are extracted from metagenomes from human associated biomes (oral cavity, vagina, skin and gut) and global oceans. The annotation is done through LCA and in most of the cases is not resolved at the species level. For example unknown Alphaproteobacteria [meta_mOTU_v2_5308] is a species that belongs to the class Alphaproteobacteria, for which the genome sequence is not available in NCBI.

Unassigned

The unassigned at the end of the profile file represents the fraction of unmapped reads. This represents species that we know to be present in the sample, but we are not able to quantify. For almost all the analysis, it is better to remove this value, since it does not represent a single species/clade. The usefulness of the unassigned comes out when we need to calculate relative abundances. See the following example:

 True rel. ab.      mOTUs read counts      mOTUs rel. ab.
species1   20%        species1    200     species1    20%
species2   10%        species3    300     species3    30%
species3   30%        species4    100     species4    10%
species4   10%        unassigned  400     unassigned  40%
species5   30%

In the example the sample (True rel. ab.) contains 5 species, of which only 3 are represented in the mOTUs profiler. Despite this, the relative abundance of these species is correct since we are able to measure the unassigned (or unmapped reads). If you would calculate the relative abundance without taking into account the unassigned, then you would get an over-estimation of the profiled species:

 True rel. ab.     mOTUs read counts       mOTUs rel. ab.
species1   20%        species1   200     species1   33.4%
species2   10%        species3   300     species3     50%
species3   30%        species4   100     species4   16.6%
species4   10%
species5   30%

Home
Installation
Taxonomy profiling
- Profile one sample
- Merge profiles
- Explain profiles
- Parameters to change the resulting profiles
- Precision and recall
- Profile long reads
- Advance usage
- GTDB taxonomy
- FAQ
SNV calling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly