-
Notifications
You must be signed in to change notification settings - Fork 3
Pipeline_Content
This workflows takes fastq files, genome sequences and annotations as input, and returns abundance estimates along side with optional quality metrics.
If you use this pipeline, cite them all, please!
MultiQC, just like FastQC, do not have any other purpose than quality metrics. It gathers all Flagstat and all FastQC individual metrics into one single report.
Citation:
- Ewels, Philip, et al. "MultiQC: summarize analysis results for multiple tools and samples in a single report." Bioinformatics 32.19 (2016): 3047-3048.
Salmon is a tool for transcript quantification from RNA-seq data. It uses pseudo-mapping to compute quantification estimates on transcripts.
Citation:
- Patro, Rob, et al. “Salmon provides fast and bias-aware quantification of transcript expression.” Nature Methods (2017).
tximport is a tool designed to import transcript quantifications from Salmon into genes quantification for DESeq2.
Citation:
- Love, Michael I., Charlotte Soneson, and Mark D. Robinson. "Importing transcript abundance datasets with tximport." dim (txi. inf. rep $ infReps $ sample1) 1.178136 (2017): 5.
- Soneson, Charlotte, Michael I. Love, and Mark D. Robinson. "Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences." F1000Research 4 (2015).
DESeq2 is a very famous tool amon the field of bioinformatics that performs differential gene expression.
Citation:
- Love, Michael I., Wolfgang Huber, and Simon Anders. "Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2." Genome biology 15.12 (2014): 550.
- Love, Michael, Simon Anders, and Wolfgang Huber. "Differential analysis of count data–the DESeq2 package." Genome Biol 15.550 (2014): 10-1186.
PCAExplorer is a program that aims to ease the analysis and exploration of PCA, their axes and the genes counts.
Citation:
- Marini, Federico, and Harald Binder. "pcaExplorer: an R/Bioconductor package for interacting with RNA-seq principal components." BMC bioinformatics 20.1 (2019): 1-8.
EnhancedVolcano is a program that eases the construction and annotation of Volcano Plots.
Citation:
- Blighe, K, S Rana, and M Lewis. 2018. “EnhancedVolcano: Publication-ready volcano plots with enhanced colouring and labeling.”.
Bioinfokit is a python library designed to perform many graphs and usual processes in bioinformatics.
Citation:
- Renesh Bedre.(2020, July 29). bioinfokit: Bioinformatics data analysis and visualization toolkit. Zenodo. doi
Snakemake is a pipeline/workflow manager written in python. It is used to handle the tools interaction, dependencies, command lines and cluster reservation. It is the skeleton of this pipeline. This pipeline is powered by the Snakemake-Wrappers, the Snakemake Workflows, and the conda project.
Citation:
- Köster, Johannes, and Sven Rahmann. "Snakemake—a scalable bioinformatics workflow engine." Bioinformatics 28.19 (2012): 2520-2522.
If you want to understand the whole ideas behind this pipeline, please read the following (tools above are not repeated):
- Roberts, Adam, et al. Improving RNA-Seq expression estimates by correcting for fragment bias. Genome Biology 12.3 (2011): 1.
- Love, Michael I., Hogenesch, John B., Irizarry, Rafael A. Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation. Nature Biotechnology 34.12 (2016).
- Varet, Hugo, et al. "SARTools: a DESeq2-and edgeR-based R pipeline for comprehensive differential analysis of RNA-Seq data." PloS one 11.6 (2016): e0157022.
- Srivastava A, Sarkar H, Gupta N, Patro R; RapMap: a rapid, sensitive and accurate tool for mapping RNA-seq reads to transcriptomes, Bioinformatics, Volume 32, Issue 12, 15 June 2016, Pages i192–i200
- Bray N.L. et al. . ( 2016) Near-optimal probabilistic RNA-seq quantification. Nature Biotech., 34(5), 525-527.
- Ceppellini, r., Siniscalco, M. & Smith, C.A. The estimation of gene frequencies in a random-mating population Ann. Hum. Genet. 20, 97–115 (1955)
- Dempster, A.P., Laird, N.M. & rubin, D.B. J. R. Maximum Likelihood from Incomplete Data via the EM Algorithm Stat. Soc. Ser. B 39, 1–38 (1977)
- Chambers, John M., and Trevor J. Hastie, eds. Statistical models in S. Vol. 251. Pacific Grove, CA: Wadsworth & Brooks/Cole Advanced Books & Software, 1992.
- Harold J. Pimentel, Nicolas Bray, Suzette Puente, Páll Melsted and Lior Pachter, Differential analysis of RNA-Seq incorporating quantification uncertainty, Nature Methods (2017)
- Kanitz, A., Gypas, F., Gruber, A. J., Gruber, A. R., Martin, G., & Zavolan, M. (2015). Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-seq data. Genome biology, 16(1), 150.
- Dillies, M. A., Rau, A., Aubert, J., Hennequet-Antier, C., Jeanmougin, M., Servant, N., … & Guernec, G. (2013). A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Briefings in bioinformatics, 14(6), 671-683.
- Storey, J. D., & Tibshirani, R. (2003). Statistical significance for genomewide studies. Proceedings of the National Academy of Sciences, 100(16), 9440-9445.
Typos corrections and issues are welcomed