NGs variant calling pipeline

Overview

This repository contains a Next-Generation Sequencing (NGS) variant calling pipeline designed for processing raw sequencing data, identifying genetic variants (SNPs, indels), and annotating those variants for downstream analysis. The pipeline is modular and flexible, allowing users to customize each stage of the process.

Standard workflow of the pipeline

Quality Control (QC)
Read Trimming
Read Alignment
Post-Alignment Processing
Variant Calling
Variant Filtering
Variant Annotation

Installation

Clone the repository:
Install the required software. You can install the dependencies using a package manager like conda: conda create -n ngs_pipeline fastqc fastp bwa samtools gatk picard bcftools annovar multiqc conda activate ngs_pipeline
Ensure the required reference genome files (FASTA, index files) are available in the appropriate directory. Index the reference genome if not already done:

bwa index reference/genome.fa samtools faidx reference/genome.fa

Usage

Quality control: To asses the quality of your raw sequence data
Read Trimming (fastp) To trim adapters and low-quality bases from the sequencing reads:
Read Alignment (BWA) To align trimmed reads to the reference genome:
Post-Alignment Processing (Samtools/Picard) Sort and index BAM files, then mark duplicates:
Variant Calling (GATK HaplotypeCaller) To call variants from the aligned reads:
Variant Filtering (GATK/bcftools) Filter low-quality variants:
Variant Annotation (ANNOVAR) Annotate the called variants:
MultiQC Report Aggregate QC reports from FastQC, fastp, and other tools:

Project structure

.
├── data/                             # Raw data files (FASTQ)
├── reference/                        # Reference genome files (FASTA, index)
├── alignment/                        # BAM files for aligned reads
├── variants/                         # VCF files with called variants
├── annotations/                      # Annotated variant files
├── qc_reports/                       # QC reports generated by FastQC, MultiQC
├── scripts/                          # Scripts for each step of the pipeline
│   ├── run_fastqc.sh                 # Script to run FastQC
│   ├── trim_reads.sh                 # Script for trimming reads with fastp
│   ├── align_reads.sh                # Script for aligning reads with BWA
│   ├── sort_bam.sh                   # Script for sorting BAM files
│   ├── mark_duplicates.sh            # Script for marking duplicates using Picard
│   ├── call_variants.sh              # Script for calling variants with GATK
│   ├── filter_variants.sh            # Script for filtering variants
│   ├── annotate_variants.sh          # Script for annotating variants with ANNOVAR
│   └── run_multiqc.sh                # Script to run MultiQC for QC aggregation
└── README.md                         # Project overview and instructions

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
output		output
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NGs variant calling pipeline

Overview

Standard workflow of the pipeline

Installation

Usage

Project structure

About

Releases

Packages

Languages

License

Itsbosire/NGS_variant_calling

Folders and files

Latest commit

History

Repository files navigation

NGs variant calling pipeline

Overview

Standard workflow of the pipeline

Installation

Usage

Project structure

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages