Update README.md

terrimporter · Jun 25, 2020 · 3bcfe87 · 3bcfe87
1 parent 5175f69
commit 3bcfe87
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/README.md b/README.md
@@ -10,7 +10,7 @@ Snakemake requires three sets of data to run: a directory containing the raw pai
 
 The user should edit the configuration file to specify directory names, indicate the sample and read fields from the sequence filenames, and specify other required pipeline parameters such as primer sequences, marker name, and whether or not pseudogene filtering should be run.
 
-The user also needs to install the appropriate RDP-trained classifier (see below).
+The user also needs to install the appropriate RDP-trained classifier (see Table 1 below).
 
 The snakefile describes the pipeline itself and normally does not need to be edited in any way.  The pipeline begins with raw paired-end Illumina MiSeq fastq.gz files.  Reads are paired.  Primers are trimmed.  All the samples are pooled for a global analysis.  Reads are dereplicated, denoised, and chimeric sequences are removed producing a reference set of denoised exact sequence variants (ESVs). At this step, the pipeline diverges into several paths:  an ITS specific dataflow, a regular dataflow, and a pseudogene filtering dataflow.  For ITS sequences, flanking rRNA gene regions are removed then they are taxonomically assigned.  For the regular pipeline, the denoised ESVs are taxonomically assigned using the RDP classifier.  If a protein coding marker is being processed, such as rbcL, then denoised ESVs are translated and the longest open reading frames (ORFs) are retained.  Obvious pseudogenes, or sequences with errors, are identified as outliers with unusually short or long sequence lengths.  If COI is being processed denoised ESVs are translated and the longest ORFs are subjected to hidden Markov model (HMM) profile analysis.  Obvious pseudogenes, or sequences with errors, are identified as outliers with unusually short HMM scores.