Skip to content

WORKFLOW: LSARP Genomics

Rauf Salamzade edited this page Dec 22, 2020 · 1 revision

The LSARP ResistanceDB Genomics workflow provides tons of genomic processing and basic analytical functionalities. Parameters can be configured for different pathogen species.

Google Sheet with description of final results from workflow found in the LSARP_Results/ subdirectory for each sample: https://docs.google.com/spreadsheets/d/15wZwNq5UKMRTBj7sm6UsUt-KA9y-3PFQk_jiTBdr6QI/

Parameter Identifier Parameter Value Type / Default Parameter Description
run_adaptertrim Boolean. True Whether to run adapter trimming with TrimGalore
trimgalore_options String. Options for TrimGalore for adapter trimming of FASTQs.
run_qualitytrim String. False Options
trimmomatic_options String. LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 Options for running Trimmomatic for quality trimming of FASTQs.
run_store_input Boolean. True Whether to store processed FASTQ files after quality-based and adapter trimming.
run_centrifuge Boolean. True Whether to run Centrifuge.
centrifuge_index String. Path to the Centrifuge database index.
run_mlst_ariba Boolean. False Whether to run ARIBA for MLST analysis.
mlst_ariba_db String. Path to the ARIBA MLST reference database.
amr_ariba_card_db String. Path to the ARIBA CARD reference database.
other_ariba_db_paths String. Path to other ARIBA reference database(s). Multiple ones should be separated by space.
other_ariba_db_names String. Names of other ARIBA reference database(s). Multiple ones should be separated by space and be in the respective order provided for other_ariba_db_paths.
run_straingst Boolean. True Whether to run StrainGST analysis to find closest strain in sample's respective genus.
straingst_db String. Path to the StrainGST *.hdf5 reference database of k-mer profiles for representative strains.
run_pilon Boolean. False Whether to run Pilon variant calling against a reference.
reference_fasta String. Path to the reference FASTA. Requires bwa index to have been run in the same directory on the reference.
run_subsample_for_assembly Boolean. False Whether to run read subsampling for assembly.
read_subsampling Integer. 1000000 The number of reads to subsample for assembly. Should correspond to around 100X coverage.
run_assembly Boolean. True Whether to construct assembly.
unicycler_flag Boolean. True Whether to use Unicycler wrapper for Illumina only assembly or SPAdes assembler directly.
spades_read_length Integer. 150 The length of FASTQ reads to inform the SPAdes assembler. Only used if SPAdes assembler is used directly.
assembly_threads Integer. 4 The number of cores/threads to provide for Illumina assembly.
assembly_memory Integer. 16 The memory (in Gb per core/thread) to provide for Illumina assembly.
assembly_timelimit String. 48:00:00 The time limit for running Illumina assembly.
gaemr_formatter_options String. -g 1 -c 100 -r Options for running GAEMR formatting/preparation for QC analysis.
gaemr_qc_options Sting. --force --analyze_rna Options for running GAEMR assembly QC analysis.
run_cleanup Boolean. True Delete intermediate files: True/False