Skip to content

Evaluating consistency and reliability of clustering in spatial omics using STEAM

License

Notifications You must be signed in to change notification settings

fanzhanglab/STEAM

Repository files navigation

STEAM

R-CMD-check License: MIT Visitors

One common challenge in evaluating the robustness of identified cell type clusters in spatial omics is the lack of ground truth cell type labels in real-world data from disease conditions. To address this, we introduce STEAM, a Spatial Transcriptomics Evaluation Algorithm and Metric for clustering performance, developed to evaluate the consistency of clustering algorithms and the reliability of cell annotations in spatial omics data. Our hypothesis is that if clusters are robust and consistent across tissue regions, selecting a subset of cells or spots within a cluster should enable accurate prediction of cell type annotations for the remaining cells within that cluster, due to spatial proximity and gene expression covarying patterns.

STEAM incorporates various machine learning models, including Random Forest, XGBoost, and SVM, to assess the prediction accuracy and consistency of clusters, along with statistical metrics like Kappa score, F1 score, ARI (Adjusted Rand Index), etc. We demonstrated the capability of STEAM on multi-cell and single-cell resolution spatial transcriptomics and proteomics. Notably, STEAM supports multi-sample training, enabling the evaluation of cross-replicate clustering consistency. Furthermore, we used STEAM to evaluate the performance of spatial-aware and spatial-ignorant clustering methods, offering researchers a valuable tool for more informed result interpretation.


Installation

You can install the STEAM Package from GitHub using the devtools as follows:

# install.packages("devtools")
devtools::install_github("fanzhanglab/STEAM")

(OR)

remotes::install_github("fanzhanglab/STEAM")

Dependencies / Other required packages

- R (\>= 4.2)
- ggplot2 (\>= 3.4.2)
- caret
- randomForest
- e1071
- scales
- gridExtra
- grid
- reshape2
- viridis

Tutorials

Step-by-step notebook of applying STEAM on 10X Visium Human Brain Data (DLPFC):

Below are several major steps of running STEAM:

# Create a new STEAM object for the loaded spatial transcriptomic data
STEAM.Obj <- LoadSTEAM(count_exp = matrix, spatial = coordinates, labels = labels, Seurat.obj = NULL)
STEAM.Obj <- RunSTEAM(STEAM.obj, train.ratio = 0.8, n.size = 5, seed = 123, cv.folds = 10, cv.repeats = 3, trainval.ratio = 0.8, model = "rf", n.tree = 500, kernel = 'linear', train.folder.name = 'train.out', allowParallel = FALSE)

Citations

Reynoso, S., Schiebout, C., Krishna, R., Zhang, F. STEAM: Spatial Transcriptomics Evaluation Algorithm and Metric for clustering performance, bioRxiv, 2025


Help, Suggestion and Contribution

Using github issues section, if you have any question, comments, suggestions, or to report coding related issues of STEAM is highly encouraged than sending emails.

  • Please check the GitHub issues for similar issues that has been reported and resolved. This helps the team to focus on adding new features and working on cool projects instead of resolving the same issues!
  • Examples are required when filing a GitHub issue. In certain cases, please share your STEAM object and related codes to understand the issues.

Contact

Please contact fanzhanglab@gmail.com for further questions or protential collaborative opportunities!

About

Evaluating consistency and reliability of clustering in spatial omics using STEAM

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •