Given a set of index, fastq and sample to barcode file, generate a directory structure with each sample's forward and reverse reads contained in individual sample directories.
- usearch
- GNU parallel
- seqkit
- xlsx2csv
- snakemake
Joins index fastq files (index1 and index2) using usearch -fastq_join
Either creates a 2-column samples to barcode file from an Excel File using xlsx2csv or uses a user supplied samples2barcode.tsv file
Converts the tab delimited samples2barcode.tsv file to a fasta file using awk
Uses Python to reformat barcodes. You might need to edit this step in the script to be specific for your barcodes
Demultiplexes the reads using usearch -fastx_demux
Splits the demultiplexed reads on a per sample basis with each sample's forward and reverse reads contained in sample specific directories. Uses GNU parallel for parallelization.
Counts and generates useful statistics on the demultiplexed reads using seqkit.
- Olabiyi Obayomi (@olabiyi)
Before you start, make sure you have the programs listed above installed.
Install the software list above.
Obtain a copy of this workflow
git clone
Replace the reads, index and sample2barcodes.tsv file with yours
Configure the workflow according to your needs by editing the files in the config.yaml
# Get a list of samples to be pasted in the config.yaml file
SAMPLES=($(awk '{print $1}' 01.raw_data/sample2barcode.tsv))
(echo -ne '[';echo ${SAMPLES[*]} | sed -E 's/ /, /g' | sed -E 's/(\w+)/"\1"/g'; echo -e ']')
snakemake -pr --cores 10 --keep-going
Upon successful completion, your demultiplexed reads will be in a folder named 06.Split/ and statistics on them in a folder named 07.Count_Seqs/