Skip to content

Identify variable genes in scRNA-seq and spatial transcriptomics data using Bayesian inference

License

Notifications You must be signed in to change notification settings

jr-leary7/bayesVG

Repository files navigation

bayesVG

R-CMD-check last commit License: MIT Coverage CodeFactor

Installation

You can install the most recent version of bayesVG using:

remotes::install_github("jr-leary7/bayesVG")

Usage

Libraries

library(dplyr)
library(Seurat)
library(bayesVG)

HVG detection

Data

First, we load the 10X Genomics pbmc3k dataset, which is composed of 2,700 peripheral blood mononuclear cells from a single healthy donor.

data("seu_pbmc")

Modeling

Now we’re able to model gene expression, summarize the posterior distribution of variance for each gene, and classify the top 3000 most-variable genes as HVGs. The findVariableFeaturesBayes() function can take as input either a Seurat or a SingleCellExperiment object.

seu_pbmc <- findVariableFeaturesBayes(seu_pbmc, 
                                      n.cells.subsample = 500L, 
                                      algorithm = "meanfield",
                                      save.model = TRUE) %>% 
            classifyHVGs(n.HVG = 3000L)

We can extract the summary table and classify the top 3,000 genes as HVGs like so. These genes can then be used as the basis for downstream analyses such as PCA, clustering, UMAP visualization, etc.

summary_hvg <- arrange(seu_pbmc@assays$RNA@meta.data, desc(dispersion_mean))
top3k_hvgs <- summary_hvg$gene[1:3000]

SVG detection

Data

First, we load the 10X Genomics anterior mouse brain dataset.

data("seu_brain")

Before running bayesVG for SVG detection it’s necessary to normalize the expression data and identify a set of naive HVGs.

seu_brain <- SCTransform(seu_brain,
                         assay = "Spatial",
                         variable.features.n = 3000L,
                         vst.flavor = "v2",
                         return.only.var.genes = FALSE,
                         seed.use = 312,
                         verbose = FALSE)

Modeling

Now we can model gene expression with an approximate multivariate hierarchical Gaussian process (GP), summarize the spatial component of variance for each gene, and classify the top 1000 most spatially variable genes as SVGs. The findSpatiallyVariableFeaturesBayes() function can take as input either a Seurat or a SpatialExperiment object.

seu_brain <- findSpatiallyVariableFeaturesBayes(seu_brain, 
                                                naive.hvgs = VariableFeatures(seu_brain), 
                                                kernel = "matern", 
                                                kernel.smoothness = 1.5, 
                                                algorithm = "meanfield", 
                                                n.cores = 4L, 
                                                save.model = TRUE) %>% 
             classifySVGs(n.SVG = 1000L)

We can extract the summary table and classify the top 1,000 genes as SVGs like so. These genes can then be used as the basis for downstream analyses such as PCA, clustering, UMAP visualization, etc.

summary_svg <- arrange(seu_brain@assays$SCT@meta.features, amplitude_mean_rank)
top1k_svgs <- summary_svg$gene[1:1000]

Contact information

This package is developed & maintained by Jack R. Leary. Feel free to reach out by opening an issue or by email (j.leary@ufl.edu) if more detailed assistance is needed.

About

Identify variable genes in scRNA-seq and spatial transcriptomics data using Bayesian inference

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published