This Seurat wrapper takes aggregated cellranger output, and performs:
- Quality control,
- Batch correction (optional),
- Clustering
- Cell-type annotation using SingleR
- Differential expression (DE) analysis (optional)
using the Seurat package in R. It takes as input a sample sheet specifying metadata and directory paths for each sample and generates a range of outputs for QC, visualization, and downstream analysis.
The wrapper will need the Cell Ranger v8.0.1 output folder path or the h5 file following aggr step. It will also need a sample sheet (aggregate_csv_file) containing the meta-data for the different samples. The aggregate_csv_file should be a CSV file with the following columns (it should follow the same order as used in cellranger aggr step).
sample_id | sample_outs | donor | origin | group |
S2 | "/path/to/outs/S2" | S2 | S2_Disease | Disease |
S9 | "/path/to/outs/S9" | S9 | S9_Normal | Normal |
S1 | "/path/to/outs/S1" | S1 | S1_Disease | Disease |
S4 | "/path/to/outs/S4" | S4 | S4_Disease | Disease |
scQCAD will produce figures in pdf format, and tables as CSV files. It will also produce two R objects in output. The R objects can be used later for additional or (re-analysis) analysisis. Here is a brief description of the output files content:
filename | description |
Quality control (QC) | |
pre_filter_qc_plot.pdf | pre-filtering QC metrics |
post_filter_qc_plot.pdf | post-filtering QC metrics |
Principal Component Analysis (PCA) | |
<project>_pca_plots.pdf | PCA plots |
Uniform Manifold Approximation and Projection (UMAP) | |
<project>_no_integration_umap_plots.pdf | UMAP (conditonal on presence of batch variable) before integration |
Integration (UMAP) | |
<project>_<integration_method>_integrated_umap_plots.pdf | UMAP (conditonal on presence of batch variable) after integration |
Clustering (UMAP) | |
<project>_seurat_clusters_final_umap_plots.pdf OR <project>_<layer_column>_integrated_seurat_clusters_final_umap_plots.pdf |
umap plot(s) of seurat clusters (conditonal on presence of batch variable <layer_column> after integration) |
Annotation (UMAP) | |
<project>_annotated_umap_plots.pdf | umap plot(s) of singleR* annotated clusters (split by <condition_column> if present) |
Marker Identification | |
<project>_<cluster_name>_all_markers.pdf OR <project>_<cluster_name>_<condition_column>_all_markers.pdf |
FeaturePlot of top1 markers, Heatmap of top10 markers, and DotPlot of top6 markers of <cluster_name>. FeaturePlot and DotPlots split by <condition_column> if present. |
<project> default name is "singleCell"
<integration_method> no default integration-method
<layer_column> no default batch-variable
<condition_column> no default group-variable
singleR* currently only supports species "human", and "mouse".
<cluster_name> "seurat_clusters" and "singleR.labels".
Currently, the following references are used for cell type annotation
# species = "human"
# species = "mouse"
The wrapper run saves two R objects
The former can be used to re-analyse without having to create the seurat object afresh. The latter object is meant to help perform additional analysis should there be a need for it. Please follow the steps described below to load the object of interest.
# for re-analysis
seurat_obj <- readRDS("path/to/<seurat_out_dir>/<project>_raw_seurat.rds")
# for additional analysis
seurat_obj <- readRDS("path/to/<seurat_out_dir>/<project>_analysed_seurat.rds")
While the wrapper is expected to be used by command-line invocation it can also be used as a function by first making it available as shown below:
# Run Seurat analysis
data_dir = "count/filtered_feature_bc_matrix",
data_file = NULL,
project_name = "test",
seurat_out_dir = "seurat_out",
min_cells = 3,
min_features = 100,
max_features = 3000,
percent_mt = NULL,
percent_rb = NULL,
aggr_csv_file = "aggregation.csv",
tcr_file = "vdj_t/filtered_contig_annotations.csv",
bcr_file = "vdj_b/filtered_contig_annotations.csv",
layer_column = "donor",
condition_column = "health_status",
integration_method = "RPCAIntegration",
enable_sct = TRUE,
perform_de = FALSE,
species = "human"
$ Rscript scQCAD.R --help
Usage: scQCAD.R [options]
This script processes single-cell RNA sequencing data. Performs quality control,
filtering, normalization, batch-correction(optional), clustering and annotation.
Optionally it also does differential expression analysis. It integrates various
Seurat functions and provides pertinent figures, and tables for an exhaustive
-f FILE, --file=FILE
count matrix data h5 file name [default= NULL]
aggregate csv file [default= NULL] - file with sample_id used to aggregate
using cellranger. Order must be same as in cellranger aggregate. Additional
columns with information about donor, condition etc should be supplied here
count matrix data directory name [default= NULL]
output directory [default= seurat_out]
minimum cells [default= 3]
minimum features [default= 100]
maximum features [default= 3000]
threshold percent mitochondrial [default= NULL] - default filtering is done
using 95th quantile
threshold percent ribosomal [default= NULL] - default filtering is done using
95th quantile
output file name [default= singleCell]
V(D)J-T annotations [default= NULL]
V(D)J-B annotations [default= NULL]
describes experimental batches, donors, or conditions [default= NULL]
main condition for comparision [default= NULL]
integration method [default= NULL] - (CCAIntegration, RPCAIntegration,
HarmonyIntegration, FastMNNIntegration, scVIIntegration)
sctransform normalization [default= TRUE]
differential expression analysis [default= FALSE]
annotation for species [default= human]
-h, --help
Show this help message and exit
DE analysis is optional and is performed if --perform-DE is enabled. Three different scenarios are explored:
- DE between each pair of clusters (seurat_clusters and SingleR.label)
- DE analysis within the same cell type (cluster type) across conditions
- DE analysis with pseudo bulking