The snakemake pipeline functions to construct the consensus sequence of the full-length Influenza A genome from Nanopore reads. The pipeline uses Centrifuge to bin reads by segments (segments 1-8) from raw reads. A draft consensus is generated by Spoa followed by error correction using medaka to polish the consensus.
Required arguments:
-i|--input Path to input samples.csv containing sample name and path to .fastq per line
-o|--output Path to output directory, the final consensus sequence will be found under consensus/
--db Path to Centrifuge database for taxonomic and segment classification
Optional arguments:
-t|--threads Number of threads [Default = 32]
-s|--segment Target specific Influenza A genomic segments for consensus calling with each segment number delimited by a comma (Example: -s 1,2,5,6)
--subsample Specify the target coverage for consensus calling [Default = 1000]
-m|--model Specify the flowcell chemistry used for Nanopore sequencing [Default = r941_min_high_g360]
--notrim Disable adaptor trimming by Porechop
--keep-tmp Keep all temporary files
-h|--help Display help message
Example command line for pipeline execution:
influenza_consensus.sh -i samples.csv -o /path/to/output --db /path/to/centrifuge/database
- R >= 3.6
- medaka == 1.0.3
- centrifuge >= 1.0.3
- seqtk >= 1.3
- snakemake >= 5.30.1
- porechop >= 0.2.4
- spoa >= 4.0.7