Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
terrimporter authored Jun 25, 2020
1 parent 5175f69 commit 3bcfe87
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Snakemake requires three sets of data to run: a directory containing the raw pai

The user should edit the configuration file to specify directory names, indicate the sample and read fields from the sequence filenames, and specify other required pipeline parameters such as primer sequences, marker name, and whether or not pseudogene filtering should be run.

The user also needs to install the appropriate RDP-trained classifier (see below).
The user also needs to install the appropriate RDP-trained classifier (see Table 1 below).

The snakefile describes the pipeline itself and normally does not need to be edited in any way. The pipeline begins with raw paired-end Illumina MiSeq fastq.gz files. Reads are paired. Primers are trimmed. All the samples are pooled for a global analysis. Reads are dereplicated, denoised, and chimeric sequences are removed producing a reference set of denoised exact sequence variants (ESVs). At this step, the pipeline diverges into several paths: an ITS specific dataflow, a regular dataflow, and a pseudogene filtering dataflow. For ITS sequences, flanking rRNA gene regions are removed then they are taxonomically assigned. For the regular pipeline, the denoised ESVs are taxonomically assigned using the RDP classifier. If a protein coding marker is being processed, such as rbcL, then denoised ESVs are translated and the longest open reading frames (ORFs) are retained. Obvious pseudogenes, or sequences with errors, are identified as outliers with unusually short or long sequence lengths. If COI is being processed denoised ESVs are translated and the longest ORFs are subjected to hidden Markov model (HMM) profile analysis. Obvious pseudogenes, or sequences with errors, are identified as outliers with unusually short HMM scores.

Expand Down

0 comments on commit 3bcfe87

Please sign in to comment.