🚧 under construction 🚧
The aim of this tool is to take amplicon sequencing reads and generate a table of all variants identified, their counts, and relative abundances.
Input:
- Sequencing reads in FASTQ format (can be gzipped)
- Reference sequence of the target region in FASTA format.
Output:
basename_clusters.txt
- three column TSV file listing, for unique sequence, the name, count, and relative abundancebasename_sequences.fasta
- multiFASTA file containing the sequence of each unique sequence named inbasename_clusters.txt
Create conda environment
conda create -n qontas -c bioconda minimap2 phylopandas pysam seqiolib seqkit vsearch -y
Clone repo and make scripts executable
git clone https://github.com/cazzlewazzle89/Qontas.git
chmod +x Qontas/*
Add directory (eg. /home/cwwalsh/Software/Qontas
) to your path
Handy guide here
Current full pipeline is run using the script qontas.sh
with 8 positional parameters.
Namely:
- Input FASTQ
- Input reference FASTA
- Output basename
- Minimum read length for FASTQ filtering
- Maximum read length for FASTQ filtering
- Amount of times a sequence must be observed per sample to be retained for relative abundance calculation (recommended to set this >1 to remove singletons [highly likely to be PCR artifacts or sequencing errors])
- Mimimum relative abundance as a percentage (eg. 2 or 0.1) for a sequence to be reported
- Number of threads to use for minimap2 mapping
eg. qontas.sh sample.fastq.gz ref.fa sample 600 650 2 0.1 10
qontas
will print these values to screen and wait 10 seconds before running to give you a chance to cancel if anything is wrong
- automate pipeline in (better) single executable
- create test dataset
- give option to retain or detele temp files
- give option to specify output directory
- modify to accept a list of input FASTQ files (TSV format) and output a single merged feature table
- alternatively, write a script that combines all the individual outputs
- Will need to modify to generate md5 read names so that they are groupable between samples
- include flag to modify minimap -x flag allowing PacBio (
-x map-pb
) or Illumina (-x sr
) reads (these will probably need to be merged beforehand)