Skip to content

Genomic-Medicine-Linkoping/scQCAD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

scQCAD

Repository Author

This Seurat wrapper takes aggregated cellranger output, and performs:

  • Quality control,
  • Batch correction (optional),
  • Clustering
  • Cell-type annotation using SingleR
  • Differential expression (DE) analysis (optional)
    using the Seurat package in R. It takes as input a sample sheet specifying metadata and directory paths for each sample and generates a range of outputs for QC, visualization, and downstream analysis.

Input Requirements

The wrapper will need the Cell Ranger v8.0.1 output folder path or the h5 file following aggr step. It will also need a sample sheet (aggregate_csv_file) containing the meta-data for the different samples. The aggregate_csv_file should be a CSV file with the following columns (it should follow the same order as used in cellranger aggr step).

sample_id sample_outs donor origin group
S2 "/path/to/outs/S2" S2 S2_Disease Disease
S9 "/path/to/outs/S9" S9 S9_Normal Normal
S1 "/path/to/outs/S1" S1 S1_Disease Disease
S4 "/path/to/outs/S4" S4 S4_Disease Disease

Output Files

scQCAD will produce figures in pdf format, and tables as CSV files. It will also produce two R objects in output. The R objects can be used later for additional or (re-analysis) analysisis. Here is a brief description of the output files content:

filename description
Quality control (QC)
pre_filter_qc_plot.pdf pre-filtering QC metrics
post_filter_qc_plot.pdf post-filtering QC metrics
Principal Component Analysis (PCA)
<project>_pca_plots.pdf PCA plots
Uniform Manifold Approximation and Projection (UMAP)
<project>_no_integration_umap_plots.pdf UMAP (conditonal on presence of batch variable) before integration
Integration (UMAP)
<project>_<integration_method>_integrated_umap_plots.pdf UMAP (conditonal on presence of batch variable) after integration
Clustering (UMAP)
<project>_seurat_clusters_final_umap_plots.pdf
OR
<project>_<layer_column>_integrated_seurat_clusters_final_umap_plots.pdf
umap plot(s) of seurat clusters (conditonal on presence of batch variable <layer_column> after integration)
Annotation (UMAP)
<project>_annotated_umap_plots.pdf umap plot(s) of singleR* annotated clusters (split by <condition_column> if present)
Marker Identification
<project>_<cluster_name>_all_markers.pdf
OR
<project>_<cluster_name>_<condition_column>_all_markers.pdf
FeaturePlot of top1 markers, Heatmap of top10 markers, and DotPlot of top6 markers of <cluster_name>. FeaturePlot and DotPlots split by <condition_column> if present.

<project> default name is "singleCell"
<integration_method> no default integration-method
<layer_column> no default batch-variable
<condition_column> no default group-variable
singleR* currently only supports species "human", and "mouse".
<cluster_name> "seurat_clusters" and "singleR.labels".

Currently, the following references are used for cell type annotation

# species = "human"
celldex::HumanPrimaryCellAtlasData()
# species = "mouse"
celldex::MouseRNAseqData()

The wrapper run saves two R objects

"/path/to/<seurat_out_dir>/<project>_raw_seurat.rds"
"/path/to/<seurat_out_dir>/<project>_analysed_seurat.rds"

The former can be used to re-analyse without having to create the seurat object afresh. The latter object is meant to help perform additional analysis should there be a need for it. Please follow the steps described below to load the object of interest.

library(Seurat)
# for re-analysis
seurat_obj <- readRDS("path/to/<seurat_out_dir>/<project>_raw_seurat.rds")
# for additional analysis
seurat_obj <- readRDS("path/to/<seurat_out_dir>/<project>_analysed_seurat.rds")

While the wrapper is expected to be used by command-line invocation it can also be used as a function by first making it available as shown below:

Make the functions available

source("scQCAD.R")

Run the analysis

# Run Seurat analysis
seurat_analysis(
data_dir = "count/filtered_feature_bc_matrix",
data_file = NULL,
project_name = "test",
seurat_out_dir = "seurat_out",
min_cells = 3,
min_features = 100,
max_features = 3000,
percent_mt = NULL,
percent_rb = NULL,
aggr_csv_file = "aggregation.csv",
tcr_file = "vdj_t/filtered_contig_annotations.csv",
bcr_file = "vdj_b/filtered_contig_annotations.csv",
layer_column = "donor",
condition_column = "health_status",
integration_method = "RPCAIntegration",
enable_sct = TRUE,
perform_de = FALSE,
species = "human"
)

CLI options

$ Rscript scQCAD.R --help
Usage: scQCAD.R [options]
This script processes single-cell RNA sequencing data. Performs quality control,
filtering, normalization, batch-correction(optional), clustering and annotation.
Optionally it also does differential expression analysis. It integrates various
Seurat functions and provides pertinent figures, and tables for an exhaustive
investigation.

Options:
	-f FILE, --file=FILE
		count matrix data h5 file name  [default= NULL]

	-a AGGREGATE-CSV, --aggregate-csv=AGGREGATE-CSV
		aggregate csv file [default= NULL] - file with sample_id used to aggregate
      using cellranger. Order must be same as in cellranger aggregate. Additional
      columns with information about donor, condition etc should be supplied here

	-d DIRECTORY, --directory=DIRECTORY
		count matrix data directory name [default= NULL]

	-o OUT-DIRECTORY, --out-directory=OUT-DIRECTORY
		output directory [default= seurat_out]

	--min-cells=MIN-CELLS
		minimum cells [default= 3]

	--min-features=MIN-FEATURES
		minimum features [default= 100]

	--max-features=MAX-FEATURES
		maximum features [default= 3000]

	--percent-mt=PERCENT-MT
		threshold percent mitochondrial [default= NULL] - default filtering is done
      using 95th quantile

	--percent-rb=PERCENT-RB
		threshold percent ribosomal [default= NULL] - default filtering is done using
      95th quantile

	--project=PROJECT
		output file name [default= singleCell]

	--vdj-t=VDJ-T
		V(D)J-T annotations [default= NULL]

	--vdj-b=VDJ-B
		V(D)J-B annotations [default= NULL]

	--layer-column=LAYER-COLUMN
		describes experimental batches, donors, or conditions [default= NULL]

	--condition-column=CONDITION-COLUMN
		main condition for comparision  [default= NULL]

	--integration-method=INTEGRATION-METHOD
		integration method  [default= NULL] - (CCAIntegration, RPCAIntegration,
      HarmonyIntegration, FastMNNIntegration, scVIIntegration)

	--enable-SCTransform=ENABLE-SCTRANSFORM
		sctransform normalization [default= TRUE]

	--perform-DE=PERFORM-DE
		differential expression analysis [default= FALSE]

	--species=SPECIES
		annotation for species [default= human]

	-h, --help
		Show this help message and exit

Differential Expression (DE) Analysis (optional)

DE analysis is optional and is performed if --perform-DE is enabled. Three different scenarios are explored:

  • DE between each pair of clusters (seurat_clusters and SingleR.label)
  • DE analysis within the same cell type (cluster type) across conditions
  • DE analysis with pseudo bulking

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages