diff --git a/DESCRIPTION b/DESCRIPTION index 048a5ab..3eaffc9 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -7,7 +7,7 @@ Authors@R: family = "Dolgalev", role = c("aut", "cre"), email = "igor.dolgalev@nyumc.org") -Description: A typical computational pipeline to process single-cell RNA sequencing (scRNA-seq) data involves clustering of cells. Assignment of cell type labels to those clusters is often a time-consuming process that involves manual inspection of the cluster marker genes complemented with a detailed literature search. This is especially challenging if you are not familiar with all the captured subpopulations or have unexpected contaminants. 'clustermole' provides a comprehensive meta collection of cell identity markers for thousands of human and mouse cell types sourced from a variety of databases as well as methods to query them. +Description: Assignment of cell type labels to single-cell RNA sequencing (scRNA-seq) clusters is often a time-consuming process that involves manual inspection of the cluster marker genes complemented with a detailed literature search. This is especially challenging when unexpected or poorly described populations are present. The clustermole R package provides methods to query thousands of human and mouse cell identity markers sourced from a variety of databases. License: MIT + file LICENSE URL: https://github.com/igordot/clustermole BugReports: https://github.com/igordot/clustermole/issues diff --git a/R/functions.R b/R/functions.R index 74d769a..e8b7d98 100644 --- a/R/functions.R +++ b/R/functions.R @@ -4,6 +4,7 @@ #' @param species Species for the appropriate gene symbol format: "hs" for human or "mm" for mouse. #' #' @return A data frame of markers with one gene per row. +#' #' @import dplyr #' @export #' @@ -31,6 +32,7 @@ clustermole_markers <- function(species = "hs") { #' @param species Species: "hs" for human or "mm" for mouse. #' #' @return A data frame of enrichment results with hypergeometric test p-values. +#' #' @import methods #' @import dplyr #' @importFrom tibble as_tibble @@ -87,6 +89,7 @@ clustermole_overlaps <- function(genes, species) { # clean up the enrichment table overlaps_tbl <- tibble::as_tibble(overlaps_mat, rownames = "celltype_full") + overlaps_tbl <- dplyr::filter(overlaps_tbl, .data$p_value < 0.05) overlaps_tbl <- dplyr::inner_join(celltypes_tbl, overlaps_tbl, by = "celltype_full") overlaps_tbl <- dplyr::arrange(overlaps_tbl, .data$fdr, .data$p_value, .data$celltype_full) overlaps_tbl @@ -98,12 +101,16 @@ clustermole_overlaps <- function(genes, species) { #' @param species Species: "hs" for human or "mm" for mouse. #' #' @return A data frame of enrichment results. +#' #' @import methods #' @import dplyr #' @importFrom tibble as_tibble #' @importFrom tidyr gather #' @importFrom GSVA gsva #' @export +#' +#' @examples +#' # my_enrichment <- clustermole_enrichment(expr_mat = my_expr_mat, species = "hs") clustermole_enrichment <- function(expr_mat, species) { # check that the expression matrix seems reasonable @@ -156,6 +163,7 @@ clustermole_enrichment <- function(expr_mat, species) { #' @param gene_label Column name for genes (variable columns of the GMT file) in the output data frame. #' #' @return A data frame with gene sets as the first column and genes as the second column (one gene per row). +#' #' @import utils #' @importFrom tibble enframe #' @importFrom tidyr unnest diff --git a/README.md b/README.md index 04f8ae6..2e29099 100644 --- a/README.md +++ b/README.md @@ -6,7 +6,9 @@ ![clustermole-book](https://user-images.githubusercontent.com/6363505/72761156-12414280-3ba9-11ea-87de-57ff6cd690bb.png) -A typical computational pipeline to process single-cell RNA sequencing (scRNA-seq) data includes clustering of cells as one of the steps. Assignment of cell type labels to those clusters is often a time-consuming process that involves manual inspection of the cluster marker genes complemented with a detailed literature search. This is especially challenging for those who are not familiar with all the captured subpopulations or have unexpected contaminants. The clustermole R package provides a comprehensive meta collection of cell identity markers for thousands of human and mouse cell types sourced from a variety of databases as well as methods to query them. +## About + +Assignment of cell type labels to single-cell RNA sequencing (scRNA-seq) clusters is often a time-consuming process that involves manual inspection of the cluster marker genes complemented with a detailed literature search. This is especially challenging when unexpected or poorly described populations are present. The clustermole R package provides methods to query thousands of human and mouse cell identity markers sourced from a variety of databases. The clustermole package provides three primary features: @@ -14,7 +16,9 @@ The clustermole package provides three primary features: * cell type prediction based on a full expression matrix * a database of cell type markers ---- +## Usage + +A [vignette](https://CRAN.R-project.org/package=clustermole/vignettes/clustermole-intro.html) is available with usage examples. Install clustermole from CRAN: diff --git a/man/clustermole_enrichment.Rd b/man/clustermole_enrichment.Rd index a50a970..96bd1d0 100644 --- a/man/clustermole_enrichment.Rd +++ b/man/clustermole_enrichment.Rd @@ -17,3 +17,6 @@ A data frame of enrichment results. \description{ Perform cell type enrichment for a given gene expression matrix } +\examples{ +# my_enrichment <- clustermole_enrichment(expr_mat = my_expr_mat, species = "hs") +} diff --git a/tests/testthat/test-functions.R b/tests/testthat/test-functions.R index dbd559c..515fb81 100644 --- a/tests/testthat/test-functions.R +++ b/tests/testthat/test-functions.R @@ -42,7 +42,7 @@ test_that("clustermole_overlaps() wrong input", { test_that("clustermole_overlaps() human input", { overlap_tbl <- clustermole_overlaps(genes = gene_names[1:50], species = "hs") expect_s3_class(overlap_tbl, "tbl_df") - expect_gt(nrow(overlap_tbl), 100) + expect_gt(nrow(overlap_tbl), 1) }) # gene list for mouse overrepresentation tests @@ -52,7 +52,7 @@ gene_names <- sample(gene_names) test_that("clustermole_overlaps() mouse input", { overlap_tbl <- clustermole_overlaps(genes = gene_names[1:50], species = "mm") expect_s3_class(overlap_tbl, "tbl_df") - expect_gt(nrow(overlap_tbl), 100) + expect_gt(nrow(overlap_tbl), 1) }) # clustermole_enrichment ----- diff --git a/vignettes/clustermole-intro.Rmd b/vignettes/clustermole-intro.Rmd index 53bbe46..905a21e 100644 --- a/vignettes/clustermole-intro.Rmd +++ b/vignettes/clustermole-intro.Rmd @@ -1,27 +1,38 @@ --- title: "Introduction to clustermole" +subtitle: "blindly digging for cell types in scRNA-seq clusters" output: prettydoc::html_pretty: keep_md: true toc: true theme: hpstr highlight: github + mathjax: null + self_contained: true vignette: > %\VignetteIndexEntry{Introduction to clustermole} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- -```{r, include = FALSE} +```{r, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` +```{css, echo=FALSE} +.header-title h2 { + text-transform: none; + font-style: italic; + font-size: 1.2rem; +} +``` + ## Overview -A typical computational pipeline to process single-cell RNA sequencing (scRNA-seq) data includes clustering of cells as one of the steps. Assignment of cell type labels to those clusters is often a time-consuming process that involves manual inspection of the cluster marker genes complemented with a detailed literature search. This is especially challenging for those who are not familiar with all the captured subpopulations or have unexpected contaminants. The clustermole R package provides a comprehensive meta collection of cell identity markers for thousands of human and mouse cell types sourced from a variety of databases as well as methods to query them. +A typical computational pipeline to process single-cell RNA sequencing (scRNA-seq) data includes clustering of cells as one of the steps. Assignment of cell type labels to those clusters is often a time-consuming process that involves manual inspection of the cluster marker genes complemented with a detailed literature search. This is especially challenging when unexpected or poorly described populations are present. The clustermole R package provides methods to query thousands of human and mouse cell identity markers sourced from a variety of databases. The clustermole package provides three primary features: @@ -78,7 +89,7 @@ If you need to convert the markers from a data frame to a list format for other markers_list = split(x = markers$gene, f = markers$celltype_full) ``` -## Collection details +## Database details We will load dplyr to help with the summary statistics. @@ -86,7 +97,7 @@ We will load dplyr to help with the summary statistics. library(dplyr) ``` -You can use `clustermole_markers()` to retrieve a data frame of all cell type markers in the database. +You can use `clustermole_markers()` to retrieve a data frame of all cell type markers in the collection. ```{r get-markers} markers = clustermole_markers(species = "hs")