Indrops analysis pipeline at BioCore@CRG
The pipeline is based on the DropEST tool: https://github.com/hms-dbmi/dropEst
- File 1: barcode reads. Structure:
- Cell barcode, part 1
- Spacer
- Cell barcode, part 2
- UMI
- File 2: gene reads
- QC: Run FastQC on raw reads. It stores the results within QC folder.
- Indexing: It makes the index of the genome by using STAR.
- dropTag: It creates a "tagged" fastq file with information about the single cell that originated that read in the header.
- Alignment: It aligns tagged reads to the indexed genome by using STAR. Reasults are stored in Alignments folder.
- dropEst: It provides the estimation of read counts per gene per single cell. The results are in Estimated_counts folder and consists of an R data object, a file with a list of cells (aka barcode combinations), another with a list of genes and a matrix in Matrix Market format (https://en.wikipedia.org/wiki/Matrix_Market_exchange_formats).
- dropReport: It reads the R data oject produced by the dropEst step to produce a quality report. It needs a list of mitochondrial genes.
- multiQC: It wraps the QC from fastQC and STAR mapping in a single output.