One common challenge in evaluating the robustness of identified cell type clusters in spatial omics is the lack of ground truth cell type labels in real-world data from disease conditions. To address this, we introduce STEAM, a Spatial Transcriptomics Evaluation Algorithm and Metric for clustering performance, developed to evaluate the consistency of clustering algorithms and the reliability of cell annotations in spatial omics data. Our hypothesis is that if clusters are robust and consistent across tissue regions, selecting a subset of cells or spots within a cluster should enable accurate prediction of cell type annotations for the remaining cells within that cluster, due to spatial proximity and gene expression covarying patterns.
STEAM incorporates various machine learning models, including Random Forest, XGBoost, and SVM, to assess the prediction accuracy and consistency of clusters, along with statistical metrics like Kappa score, F1 score, ARI (Adjusted Rand Index), etc. We demonstrated the capability of STEAM on multi-cell and single-cell resolution spatial transcriptomics and proteomics. Notably, STEAM supports multi-sample training, enabling the evaluation of cross-replicate clustering consistency. Furthermore, we used STEAM to evaluate the performance of spatial-aware and spatial-ignorant clustering methods, offering researchers a valuable tool for more informed result interpretation.
You can install the STEAM Package from GitHub using the devtools as follows:
# install.packages("devtools")
devtools::install_github("fanzhanglab/STEAM")
(OR)
remotes::install_github("fanzhanglab/STEAM")
- R (\>= 4.2)
- ggplot2 (\>= 3.4.2)
- caret
- randomForest
- e1071
- scales
- gridExtra
- grid
- reshape2
- viridis
Step-by-step notebook of applying STEAM on 10X Visium Human Brain Data (DLPFC):
# Create a new STEAM object for the loaded spatial transcriptomic data
STEAM.Obj <- LoadSTEAM(count_exp = matrix, spatial = coordinates, labels = labels, Seurat.obj = NULL)
STEAM.Obj <- RunSTEAM(STEAM.obj, train.ratio = 0.8, n.size = 5, seed = 123, cv.folds = 10, cv.repeats = 3, trainval.ratio = 0.8, model = "rf", n.tree = 500, kernel = 'linear', train.folder.name = 'train.out', allowParallel = FALSE)
Reynoso, S., Schiebout, C., Krishna, R., Zhang, F. STEAM: Spatial Transcriptomics Evaluation Algorithm and Metric for clustering performance, bioRxiv, 2025
Using github issues section, if you have any question, comments, suggestions, or to report coding related issues of STEAM is highly encouraged than sending emails.
- Please check the GitHub issues for similar issues that has been reported and resolved. This helps the team to focus on adding new features and working on cool projects instead of resolving the same issues!
- Examples are required when filing a GitHub issue. In certain cases, please share your STEAM object and related codes to understand the issues.
Please contact fanzhanglab@gmail.com for further questions or protential collaborative opportunities!