Skip to content

cazzlewazzle89/Qontas

Repository files navigation

Qontas

✈️ Quantification of ONT Amplicon Sequences ✈️

DOI

🚧 under construction 🚧

The aim of this tool is to take amplicon sequencing reads and generate a table of all variants identified, their counts, and relative abundances.

Input:

  1. Sequencing reads in FASTQ format (can be gzipped)
  2. Reference sequence of the target region in FASTA format.

Output:

  1. basename_clusters.txt - three column TSV file listing, for unique sequence, the name, count, and relative abundance
  2. basename_sequences.fasta - multiFASTA file containing the sequence of each unique sequence named in basename_clusters.txt

SETUP

Create conda environment

conda create -n qontas -c bioconda minimap2 phylopandas pysam seqiolib seqkit vsearch -y

Clone repo and make scripts executable

git clone https://github.com/cazzlewazzle89/Qontas.git

chmod +x Qontas/*

Add directory (eg. /home/cwwalsh/Software/Qontas) to your path
Handy guide here

USAGE

Current full pipeline is run using the script qontas.sh with 8 positional parameters.
Namely:

  1. Input FASTQ
  2. Input reference FASTA
  3. Output basename
  4. Minimum read length for FASTQ filtering
  5. Maximum read length for FASTQ filtering
  6. Amount of times a sequence must be observed per sample to be retained for relative abundance calculation (recommended to set this >1 to remove singletons [highly likely to be PCR artifacts or sequencing errors])
  7. Mimimum relative abundance as a percentage (eg. 2 or 0.1) for a sequence to be reported
  8. Number of threads to use for minimap2 mapping

eg. qontas.sh sample.fastq.gz ref.fa sample 600 650 2 0.1 10

qontas will print these values to screen and wait 10 seconds before running to give you a chance to cancel if anything is wrong

TO DO

  • automate pipeline in (better) single executable
  • create test dataset
  • give option to retain or detele temp files
  • give option to specify output directory
  • modify to accept a list of input FASTQ files (TSV format) and output a single merged feature table
    • alternatively, write a script that combines all the individual outputs
    • Will need to modify to generate md5 read names so that they are groupable between samples
  • include flag to modify minimap -x flag allowing PacBio (-x map-pb) or Illumina (-x sr) reads (these will probably need to be merged beforehand)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published