Skip to content
This repository has been archived by the owner on Nov 29, 2021. It is now read-only.

Pipeline_Content

tdayris-perso edited this page Jan 14, 2021 · 2 revisions

Pipeline content

Global workflow

This workflows takes fastq files, genome sequences and annotations as input, and returns abundance estimates along side with optional quality metrics.

If you use this pipeline, cite them all, please!

MultiQC

MultiQC, just like FastQC, do not have any other purpose than quality metrics. It gathers all Flagstat and all FastQC individual metrics into one single report.

Citation:

  • Ewels, Philip, et al. "MultiQC: summarize analysis results for multiple tools and samples in a single report." Bioinformatics 32.19 (2016): 3047-3048.

Salmon

Salmon is a tool for transcript quantification from RNA-seq data. It uses pseudo-mapping to compute quantification estimates on transcripts.

Citation:

  • Patro, Rob, et al. “Salmon provides fast and bias-aware quantification of transcript expression.” Nature Methods (2017).

tximport

tximport is a tool designed to import transcript quantifications from Salmon into genes quantification for DESeq2.

Citation:

DESeq2

DESeq2 is a very famous tool amon the field of bioinformatics that performs differential gene expression.

Citation:

PCAExplorer

PCAExplorer is a program that aims to ease the analysis and exploration of PCA, their axes and the genes counts.

Citation:

EnhancedVolcano

EnhancedVolcano is a program that eases the construction and annotation of Volcano Plots.

Citation:

Bioinfokit

Bioinfokit is a python library designed to perform many graphs and usual processes in bioinformatics.

Citation:

Snakemake

Snakemake is a pipeline/workflow manager written in python. It is used to handle the tools interaction, dependencies, command lines and cluster reservation. It is the skeleton of this pipeline. This pipeline is powered by the Snakemake-Wrappers, the Snakemake Workflows, and the conda project.

Citation:

  • Köster, Johannes, and Sven Rahmann. "Snakemake—a scalable bioinformatics workflow engine." Bioinformatics 28.19 (2012): 2520-2522.

Understanding methodology

If you want to understand the whole ideas behind this pipeline, please read the following (tools above are not repeated):

  1. Roberts, Adam, et al. Improving RNA-Seq expression estimates by correcting for fragment bias. Genome Biology 12.3 (2011): 1.
  2. Love, Michael I., Hogenesch, John B., Irizarry, Rafael A. Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation. Nature Biotechnology 34.12 (2016).
  3. Varet, Hugo, et al. "SARTools: a DESeq2-and edgeR-based R pipeline for comprehensive differential analysis of RNA-Seq data." PloS one 11.6 (2016): e0157022.
  4. Srivastava A, Sarkar H, Gupta N, Patro R; RapMap: a rapid, sensitive and accurate tool for mapping RNA-seq reads to transcriptomes, Bioinformatics, Volume 32, Issue 12, 15 June 2016, Pages i192–i200
  5. Bray N.L. et al. . ( 2016) Near-optimal probabilistic RNA-seq quantification. Nature Biotech., 34(5), 525-527.
  6. Ceppellini, r., Siniscalco, M. & Smith, C.A. The estimation of gene frequencies in a random-mating population Ann. Hum. Genet. 20, 97–115 (1955)
  7. Dempster, A.P., Laird, N.M. & rubin, D.B. J. R. Maximum Likelihood from Incomplete Data via the EM Algorithm Stat. Soc. Ser. B 39, 1–38 (1977)
  8. Chambers, John M., and Trevor J. Hastie, eds. Statistical models in S. Vol. 251. Pacific Grove, CA: Wadsworth & Brooks/Cole Advanced Books & Software, 1992.
  9. Harold J. Pimentel, Nicolas Bray, Suzette Puente, Páll Melsted and Lior Pachter, Differential analysis of RNA-Seq incorporating quantification uncertainty, Nature Methods (2017)
  10. Kanitz, A., Gypas, F., Gruber, A. J., Gruber, A. R., Martin, G., & Zavolan, M. (2015). Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-seq data. Genome biology, 16(1), 150.
  11. Dillies, M. A., Rau, A., Aubert, J., Hennequet-Antier, C., Jeanmougin, M., Servant, N., … & Guernec, G. (2013). A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Briefings in bioinformatics, 14(6), 671-683.
  12. Storey, J. D., & Tibshirani, R. (2003). Statistical significance for genomewide studies. Proceedings of the National Academy of Sciences, 100(16), 9440-9445.