Skip to content

Profile long reads

Hans-Joachim Ruscheweyh edited this page Jul 14, 2022 · 16 revisions

Since mOTUs 3.0.3 it is possible to profile long reads with mOTUs.

Given a long read fasta file (let's say long_reads.fasta), you need to first run:

motus prep_long -i long_reads.fasta -o converted_long_reads.fasta.gz

to split the long reads into shorter reads that can be profiled by a default mOTUs profile call like:

motus profile -s converted_long_reads.fasta.gz

Note that the command prep_long works on both fasta and fastq files, as well as on .gz files.

How to install mOTUs

The latest version of mOTUs (3.0.3) can not be installed via conda yet (we are working on it!). It can be installed via pip if the general mOTUs dependencies are installed before:

python -m pip install motu-profiler

Note that you have to install the following dependencies manually:

Example and suggestions

You can try to run mOTUs on a mock community. To download the long reads use the following command:

wget https://sunagawalab.ethz.ch/share/MOTUS_DATA/motus_3.0.3/motus_long_reads/HiFi-ATCC-MSA-1003.250k.fastq.gz

Note: This dataset represents the bigger dataset SRR9328980 which was subsampled to 10% of the original number of reads.

We first prepare the long read to run on mOTUs with:

motus prep_long -i HiFi-ATCC-MSA-1003.250k.fastq.gz -o HiFi-ATCC-MSA-1003.250k.short.fastq -no_gz
gzip HiFi-ATCC-MSA-1003.250k.short.fastq # or "pigz -p 32 HiFi-ATCC-MSA-1003.250k.short.fastq" if pigz is installed

# Or download the prepared result produced by the command:
wget https://sunagawalab.ethz.ch/share/MOTUS_DATA/motus_3.0.3/motus_long_reads/HiFi-ATCC-MSA-1003.250k.short.fastq.gz

Note: We compress the file manually due to performance issues with the python gzip module.

Run mOTUs with:

# We use -A to be consistent with the report shown below. -A doesn't change the profile, just the report type
# -t defines the number of threads
motus profile -A -s HiFi-ATCC-MSA-1003.250k.short.fastq.gz -o HiFi-ATCC-MSA-1003.motus -t 32

# Or download the prepared result produced by the command:
wget https://sunagawalab.ethz.ch/share/MOTUS_DATA/motus_3.0.3/motus_long_reads/HiFi-ATCC-MSA-1003.motus

Explore the result:

# Get abundances for genus level
grep "g__" HiFi-ATCC-MSA-1003.motus | grep -v "s__"


k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Pseudomonadales|f__Pseudomonadaceae|g__Pseudomonas	0.0256625687
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Pseudomonadales|f__Moraxellaceae|g__Acinetobacter	0.0041047739
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae|g__Escherichia	0.1690642823
k__Bacteria|p__Proteobacteria|c__Alphaproteobacteria|o__Rhodobacterales|f__Rhodobacteraceae|g__Rhodobacter	0.2692651120
k__Bacteria|p__Proteobacteria|c__Betaproteobacteria|o__Neisseriales|f__Neisseriaceae|g__Neisseria	0.0016740789
k__Bacteria|p__Proteobacteria|c__Epsilonproteobacteria|o__Campylobacterales|f__Helicobacteraceae|g__Helicobacter	0.0019215690
k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Clostridiaceae|g__Clostridium	0.0065334779
k__Bacteria|p__Firmicutes|c__Bacilli|o__Bacillales|f__Bacillaceae|g__Bacillus	0.0208843590
k__Bacteria|p__Firmicutes|c__Bacilli|o__Bacillales|f__Staphylococcaceae|g__Staphylococcus	0.0725363474
k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Streptococcaceae|g__Streptococcus	0.1647341426
k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Lactobacillaceae|g__Lactobacillus	0.0009095220
k__Bacteria|p__Deinococcus-Thermus|c__Deinococci|o__Deinococcales|f__Deinococcaceae|g__Deinococcus	0.0013104470
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Propionibacteriales|f__Propionibacteriaceae|g__Cutibacterium	0.0035726154
k__Bacteria|p__Bacteroidetes|c__Bacteroidia|o__Bacteroidales|f__Porphyromonadaceae|g__Porphyromonas	0.1891746967

# Get abundances starting for mOTU/species level
grep "s__" HiFi-ATCC-MSA-1003.motus

k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Enterobacteriaceae|g__Escherichia|s__Escherichia coli [ref_mOTU_v3_00095]	0.1690642823
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Pseudomonadales|f__Pseudomonadaceae|g__Pseudomonas|s__Pseudomonas aeruginosa [ref_mOTU_v3_00201]	0.0256625687
k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Pseudomonadales|f__Moraxellaceae|g__Acinetobacter|s__Acinetobacter baumannii [ref_mOTU_v3_00259]	0.0041047739
k__Bacteria|p__Firmicutes|c__Bacilli|o__Bacillales|f__Bacillaceae|g__Bacillus|s__Bacillus sp. [ref_mOTU_v3_00329]	0.0208843590
k__Bacteria|p__Firmicutes|c__Bacilli|o__Bacillales|f__Staphylococcaceae|g__Staphylococcus|s__Staphylococcus aureus [ref_mOTU_v3_00340]	0.0053912499
k__Bacteria|p__Firmicutes|c__Bacilli|o__Bacillales|f__Staphylococcaceae|g__Staphylococcus|s__Staphylococcus epidermidis [ref_mOTU_v3_00346]	0.0671450975
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Propionibacteriales|f__Propionibacteriaceae|g__Cutibacterium|s__Cutibacterium acnes [ref_mOTU_v3_00800]	0.0035726154
k__Bacteria|p__Proteobacteria|c__Epsilonproteobacteria|o__Campylobacterales|f__Helicobacteraceae|g__Helicobacter|s__Helicobacter pylori [ref_mOTU_v3_00897]	0.0019215690
k__Bacteria|p__Bacteroidetes|c__Bacteroidia|o__Bacteroidales|f__Porphyromonadaceae|g__Porphyromonas|s__Porphyromonas gingivalis [ref_mOTU_v3_00985]	0.1891746967
k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Lactobacillaceae|g__Lactobacillus|s__Lactobacillus gasseri [ref_mOTU_v3_01039]	0.0009095220
k__Bacteria|p__Proteobacteria|c__Alphaproteobacteria|o__Rhodobacterales|f__Rhodobacteraceae|g__Rhodobacter|s__Rhodobacter sphaeroides/johrii [ref_mOTU_v3_01513]	0.2692651120
k__Bacteria|p__Proteobacteria|c__Betaproteobacteria|o__Neisseriales|f__Neisseriaceae|g__Neisseria|s__Neisseria meningitidis [ref_mOTU_v3_01539]	0.0016740789
k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Streptococcaceae|g__Streptococcus|s__Streptococcus mutans [ref_mOTU_v3_01605]	0.1567639053
k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Streptococcaceae|g__Streptococcus|s__Streptococcus agalactiae [ref_mOTU_v3_01860]	0.0079702373
k__Bacteria|p__Deinococcus-Thermus|c__Deinococci|o__Deinococcales|f__Deinococcaceae|g__Deinococcus|s__Deinococcus radiodurans [ref_mOTU_v3_02207]	0.0013104470
k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Clostridiaceae|g__Clostridium|s__Clostridium beijerinckii [ref_mOTU_v3_03007]	0.0065334779