Chip-seq Pipeline

This repository contains a pipeline for the analysis of the Transcription Factor and Histone Mark ChIP-seq data.

Motivation

The aim of this project is to create a standard pipeline that includes, metadata extraction, quality controls, data analysis and peak calling. This pipeline, receiving as input a list of GEO GSM ids and their corresponding factor (both Histone Marks and Transcription Factors), is able to automatically recognise the phred quality score, the sample's organism and discriminate single-end samples from paired end.

Pipeline summary

Fastq file download (parallel-fastq-dump)
Metadata Extraction (Entrez-Direct)
Raw read QC (FastQC)
Alignment (bowtie2)
Mark duplicates (SAMtools)
Filtering to remove:
- reads mapping to blacklisted regions (SAMtools, bedtools)
- reads that are marked as duplicates (SAMtool)
- reads that are unmapped (SAMtool)
Evaluate sequencing alignment data (qualimap)
Calculate PCR bottleneck coefficient (PBC) & Non-Redundant Fraction (NRF) (pysam)
Call broad/narrow peaks (MACS2)
Present QC for raw read, alignment, peak-calling and differential binding results (MultiQC)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
chip_seq_pipeline.ipynb		chip_seq_pipeline.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chip-seq Pipeline

Motivation

Pipeline summary

About

Releases

Packages

Languages

MicheleRoar/Chipseq-pipeline

Folders and files

Latest commit

History

Repository files navigation

Chip-seq Pipeline

Motivation

Pipeline summary

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages