diff --git a/CRAN-RELEASE b/CRAN-RELEASE deleted file mode 100644 index e36eb10..0000000 --- a/CRAN-RELEASE +++ /dev/null @@ -1,2 +0,0 @@ -This package was submitted to CRAN on 2020-01-14. -Once it is accepted, delete this file and tag the release (commit f7d2e89092). diff --git a/DESCRIPTION b/DESCRIPTION index 97e1937..048a5ab 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,7 +1,7 @@ Type: Package Package: clustermole -Title: Unbiased Cell Type Identification of Single-Cell Transcriptomic Data -Version: 1.0.0 +Title: Unbiased Single-Cell Transcriptomic Data Cell Type Identification +Version: 1.0.0.9000 Authors@R: person(given = "Igor", family = "Dolgalev", @@ -24,10 +24,11 @@ Imports: utils Suggests: covr, - roxygen2, - testthat (>= 2.1.0), knitr, - rmarkdown + prettydoc, + rmarkdown, + roxygen2, + testthat (>= 2.1.0) biocViews: Encoding: UTF-8 LazyData: true diff --git a/README.md b/README.md index ef9353a..04f8ae6 100644 --- a/README.md +++ b/README.md @@ -1,26 +1,28 @@ # clustermole: blindly digging for cell types in scRNA-seq clusters +[![CRAN](https://www.r-pkg.org/badges/version/clustermole)](https://cran.r-project.org/package=clustermole) [![Travis Build Status](https://travis-ci.org/igordot/clustermole.svg?branch=master)](https://travis-ci.org/igordot/clustermole) [![codecov](https://codecov.io/gh/igordot/clustermole/branch/master/graph/badge.svg)](https://codecov.io/gh/igordot/clustermole) -> See, children, the misguided Mole. -> He lives down in a deep, dark hole; -> Sweetness, and light, and good fresh air -> Are things for which he does not care. -> He has not even that makeshift -> Of feeble minds - the social gift. -> But say not that he has no soul, -> Lest haply we misjudge the Mole; -> Nay, if we measure him by men, -> No doubt he sits in his dark den -> Instructing others blind as he -> Exactly how the world should be. -> -> -- Oliver Herford - -A typical computational pipeline to process single-cell RNA sequencing (scRNA-seq) data involves clustering of cells. Assignment of cell type labels to those clusters is often a time-consuming process that involves manual inspection of the cluster marker genes complemented with a detailed literature search. This is especially challenging if you are not familiar with all the captured subpopulations or have unexpected contaminants. `clustermole` is an R package that provides a comprehensive meta collection of cell identity markers for thousands of human and mouse cell types sourced from a variety of databases as well as methods to query them. - -Install clustermole (development version): +![clustermole-book](https://user-images.githubusercontent.com/6363505/72761156-12414280-3ba9-11ea-87de-57ff6cd690bb.png) + +A typical computational pipeline to process single-cell RNA sequencing (scRNA-seq) data includes clustering of cells as one of the steps. Assignment of cell type labels to those clusters is often a time-consuming process that involves manual inspection of the cluster marker genes complemented with a detailed literature search. This is especially challenging for those who are not familiar with all the captured subpopulations or have unexpected contaminants. The clustermole R package provides a comprehensive meta collection of cell identity markers for thousands of human and mouse cell types sourced from a variety of databases as well as methods to query them. + +The clustermole package provides three primary features: + +* cell type prediction based on marker genes +* cell type prediction based on a full expression matrix +* a database of cell type markers + +--- + +Install clustermole from CRAN: + +```r +install.packages("clustermole") +``` + +Alternatively, you can install the development version from GitHub (not recommended): ```r BiocManager::install("igordot/clustermole", update = FALSE) @@ -32,20 +34,24 @@ Load clustermole: library(clustermole) ``` -Retrieve a table of all cell type markers: +Perform cell type overrepresentation analysis for a given set of genes: ```r -clustermole_markers(genes, species) +clustermole_overlaps(genes, species = "hs") ``` -Perform cell type overrepresentation analysis for a given set of genes: +Perform cell type enrichment for a given full gene expression matrix: ```r -clustermole_overlaps(expr_mat, species) +clustermole_enrichment(expr_mat, species = "hs") ``` -Perform cell type enrichment for a given full gene expression matrix: +Retrieve a table of all cell type markers: ```r -clustermole_enrichment(expr_mat, species) +clustermole_markers(species = "hs") ``` + +--- + +*Image credit: "A Child's Primer Of Natural History" by Oliver Herford* diff --git a/vignettes/clustermole-intro.Rmd b/vignettes/clustermole-intro.Rmd index e928ee1..53bbe46 100644 --- a/vignettes/clustermole-intro.Rmd +++ b/vignettes/clustermole-intro.Rmd @@ -1,8 +1,11 @@ --- title: "Introduction to clustermole" output: - rmarkdown::html_vignette: + prettydoc::html_pretty: keep_md: true + toc: true + theme: hpstr + highlight: github vignette: > %\VignetteIndexEntry{Introduction to clustermole} %\VignetteEngine{knitr::rmarkdown} @@ -16,35 +19,33 @@ knitr::opts_chunk$set( ) ``` -*Alternative title: blindly digging for cell types in scRNA-seq clusters with clustermole* - ## Overview -A typical computational pipeline to process single-cell RNA sequencing (scRNA-seq) data involves clustering of cells. Assignment of cell type labels to those clusters is often a time-consuming process that involves manual inspection of the cluster marker genes complemented with a detailed literature search. This is especially challenging if you are not familiar with all the captured subpopulations or have unexpected contaminants. `clustermole` is an R package that provides a comprehensive meta collection of cell identity markers for thousands of human and mouse cell types sourced from a variety of databases as well as methods to query them. +A typical computational pipeline to process single-cell RNA sequencing (scRNA-seq) data includes clustering of cells as one of the steps. Assignment of cell type labels to those clusters is often a time-consuming process that involves manual inspection of the cluster marker genes complemented with a detailed literature search. This is especially challenging for those who are not familiar with all the captured subpopulations or have unexpected contaminants. The clustermole R package provides a comprehensive meta collection of cell identity markers for thousands of human and mouse cell types sourced from a variety of databases as well as methods to query them. -The `clustermole` package includes three primary features: +The clustermole package provides three primary features: -* cell type identification based on a set of marker genes (`clustermole_overlaps`) -* cell type identification based on a full expression matrix (`clustermole_enrichment`) -* a meta collection of cell type markers (`clustermole_markers`) +* cell type prediction based on marker genes (`clustermole_overlaps`) +* cell type prediction based on a full expression matrix (`clustermole_enrichment`) +* a database of cell type markers (`clustermole_markers`) ## Usage -Install `clustermole` if it is not yet available on your system. +Install clustermole if it is not yet available on your system. ```{r install-package, eval=FALSE} install.packages("clustermole") ``` -Load `clustermole`. +Load clustermole. ```{r load-package, message=FALSE} library(clustermole) ``` -### Overlap a set of genes with cell type markers +### `clustermole_overlaps()`: cell type prediction based on marker genes -If you have a set of genes (for example, cluster markers), you can perform overrepresentation analysis to see if they overlap any of the known cell type markers. +If you have a set of genes, such as cluster markers, you can compare them to known cell type markers to see if they overlap any of the known cell type markers (overrepresentation analysis). ```{r overlaps} my_genes = c("CD2", "CD3D", "CD3E", "IL7R", "IL32", "LTB", "LDHB", "CCR7") @@ -52,17 +53,17 @@ my_overlaps = clustermole_overlaps(genes = my_genes, species = "hs") my_overlaps ``` -### Determine relative enrichment of cell type markers in the input expression data +### `clustermole_enrichment()`: cell type enrichment in the full expression matrix -If you have a table of expression values (for example, average expression across clusters), you can perform cell type enrichment based on a given gene expression matrix (log-transformed CPM/TPM/FPKM values). +If you have a table of expression values, such as average expression across clusters, you can perform cell type enrichment based on a given gene expression matrix (log-transformed CPM/TPM/FPKM values). Genes are rows and clusters/samples are columns. ```{r enrichment, eval=FALSE} clustermole_enrichment(expr_mat = my_expr_mat, species = "hs") ``` -### Retrieve cell type markers +### `clustermole_markers()`: retrieve cell type markers -You can retrieve a data frame of all cell type markers in the database. +You can use `clustermole` as a simple database and get a data frame of all cell type markers. ```{r markers} markers = clustermole_markers(species = "hs") @@ -79,13 +80,13 @@ markers_list = split(x = markers$gene, f = markers$celltype_full) ## Collection details -We will use `dplyr` to help with summary statistics. +We will load dplyr to help with the summary statistics. ```{r load-dplyr, message=FALSE} library(dplyr) ``` -Retrieve a data frame of all cell type markers in the database. +You can use `clustermole_markers()` to retrieve a data frame of all cell type markers in the database. ```{r get-markers} markers = clustermole_markers(species = "hs")