This repository contains a pipeline for the analysis of the Transcription Factor and Histone Mark ChIP-seq data.
The aim of this project is to create a standard pipeline that includes, metadata extraction, quality controls, data analysis and peak calling. This pipeline, receiving as input a list of GEO GSM ids and their corresponding factor (both Histone Marks and Transcription Factors), is able to automatically recognise the phred quality score, the sample's organism and discriminate single-end samples from paired end.
- Fastq file download (parallel-fastq-dump)
- Metadata Extraction (Entrez-Direct)
- Raw read QC (FastQC)
- Alignment (bowtie2)
- Mark duplicates (SAMtools)
- Filtering to remove:
- Evaluate sequencing alignment data (qualimap)
- Calculate PCR bottleneck coefficient (PBC) & Non-Redundant Fraction (NRF) (pysam)
- Call broad/narrow peaks (MACS2)
- Present QC for raw read, alignment, peak-calling and differential binding results (MultiQC)