-
Notifications
You must be signed in to change notification settings - Fork 27
Profile one sample
The mOTUs profiler takes as input one or multiple fastq files. The input can be zipped in .gz or .bz2 format. The output is a file with the relative abundance (or read counts) of the prokaryotic species found in the sample.
To profile samples we will use the command profile
, type motus profile
for the list of parameters.
Content of this page:
- specify fastq input files
- specify the name of the sample
- number of threads
- use of stderr and stdout
- save profile to file
- save intermediate files
In the fastq format, it is possible to provide paired end reads:
motus profile -f forward_reads.fastq -r reverse_reads.fastq
as well as single reads that come from quality filtering:
motus profile -f forward_reads.fq -r reverse_reads.fq -s single_reads.fq
or only single reads if the sequencing was not performed with paired end reads:
motus profile -s single_reads.fq
Moreover, if the sample was divided in two (or more) lanes, is possible to profile all reads together:
motus profile -f for_r_lane1.fq,for_r_lane2.fq -r rev_r_lane1.fq,rev_r_lane2.fq
You can specify the name of the sample with the -n
option.
Example:
motus profile -s ERS199259_filtered.fq.gz -n ERS199259
In the resulting profile, the third header will be:
#consensus_taxonomy ERS199259
If you do not specify the name, then the third header will be:
#consensus_taxonomy unnamed sample
Note that the name that is specified with -n
will be used in the merge function of motus.
If you do not specify the names, and merge samples, then it will not be possible to understand which column corresponds to which sample. Example:
motus profile -s ERS199259_filtered.fq.gz -o ERS199259.motus
motus profile -s SRS190579_filtered.fq.gz -o SRS190579.motus
motus merge -i ERS199259.motus,SRS190579.motus
#consensus_taxonomy unnamed sample unnamed sample
Kandleria vitulina [ref_mOTU_v2_0001] 93 42
Methyloversatilis universalis [ref_mOTU_v2_0002] 0 4
You can specify the number of threads that you want to use with the -t
option. Example:
motus profile -s ERS199259_filtered.fq.gz -t 4
We suggest to use 4 or 8 threads, note that the time needed for profiling is not scaling linearly with the number of threads.
As default, the profile is printed to the standard output, and the messages are printed to the standard error. Hence, you can separate messages and result with the following code:
motus profile -s test1.fastq > ~/Desktop/result 2> ~/Desktop/messages
Where ~/Desktop/result
contains:
head ~/Desktop/result
# git tag version 2.1.0 | motus version 2.1.0 | map_tax 2.1.0 | gene database: nr2.1.0 | calc_mgc 2.1.0 -y insert.scaled_counts -l 75 | calc_motu 2.1.0 -k mOTU -C no_CAMI -g 3 | taxonomy: ref_mOTU_2.1.0 meta_mOTU_2.1.0
# call: python mOTUs_v2/motus profile -s test1.fastq
#consensus_taxonomy unnamed sample
Kandleria vitulina [ref_mOTU_v2_0001] 0.0688211617
Methyloversatilis universalis [ref_mOTU_v2_0002] 0.0000000000
Megasphaera genomosp. [ref_mOTU_v2_0003] 0.0234955832
And ~/Desktop/messages
contains:
head ~/Desktop/messages
[main] MAP_TAX -----------
[main] Number of detected lanes: 1
[main] Run bwa on lane 1
[map_db](map single reads) 8.68 sec
[main] Total time map_tax: 8.69 sec
[main] CALC_MGC -----------
It is possible to save the profile with the -o
option. Example:
motus profile -s test1.fastq -o ~/Desktop/result.motus
which is equivalent to:
motus profile -s test1.fastq > ~/Desktop/result.motus
The reads from fastq files are mapped through bwa (Li H. and Durbin R. (2009) Bioinformatics) to the mOTUs database and filtered for the length of the alignment (-l
option) and the percentage identity of the alignment (97%).
You can save the filtered reads with -I
to a bam file:
motus profile -s test1.fastq -I test1.bam
And profile providing this bam file as input:
motus profile -i test1.bam
Check also issue#48 for some examples on how to check the produced bam file (useful also to analyze what happen to each read from the fastq file).
If you want to run mOTUs with different parameters, you can save the intermediate bam file and provide it as input afterwards. Example:
motus profile -s test1.fq -I test1.bam -l 40 -o result_l40.motus
motus profile -i test1.bam -l 70 -o result_l70.motus
motus profile -i test1.bam -l 100 -o result_l100.motus
Here, we ran mOTUs with three different read length filters (40, 70 and 100). The first line takes 40 mins, while the last two take 5 mins each (this is due to the fact that running bwa takes 80% of the total computation time).
Note that the filtered reads saved in the first line are filtered for -l 40
, hence it's possible to then filter for -l 70
. It is not possible to do the contrary: if you filter out the reads shorter than 70, then you miss the reads with a mapping length in between 40 and 70.
In a similar manner, it is possible to save the marker gene cluster (MGC) counts with -M
and load them with -m
.