Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
igordot committed Jan 21, 2020
1 parent 9ebc78e commit cc31231
Show file tree
Hide file tree
Showing 4 changed files with 55 additions and 49 deletions.
2 changes: 0 additions & 2 deletions CRAN-RELEASE

This file was deleted.

11 changes: 6 additions & 5 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Type: Package
Package: clustermole
Title: Unbiased Cell Type Identification of Single-Cell Transcriptomic Data
Version: 1.0.0
Title: Unbiased Single-Cell Transcriptomic Data Cell Type Identification
Version: 1.0.0.9000
Authors@R:
person(given = "Igor",
family = "Dolgalev",
Expand All @@ -24,10 +24,11 @@ Imports:
utils
Suggests:
covr,
roxygen2,
testthat (>= 2.1.0),
knitr,
rmarkdown
prettydoc,
rmarkdown,
roxygen2,
testthat (>= 2.1.0)
biocViews:
Encoding: UTF-8
LazyData: true
Expand Down
54 changes: 30 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,28 @@
# clustermole: blindly digging for cell types in scRNA-seq clusters

[![CRAN](https://www.r-pkg.org/badges/version/clustermole)](https://cran.r-project.org/package=clustermole)
[![Travis Build Status](https://travis-ci.org/igordot/clustermole.svg?branch=master)](https://travis-ci.org/igordot/clustermole)
[![codecov](https://codecov.io/gh/igordot/clustermole/branch/master/graph/badge.svg)](https://codecov.io/gh/igordot/clustermole)

> See, children, the misguided Mole.
> He lives down in a deep, dark hole;
> Sweetness, and light, and good fresh air
> Are things for which he does not care.
> He has not even that makeshift
> Of feeble minds - the social gift.
> But say not that he has no soul,
> Lest haply we misjudge the Mole;
> Nay, if we measure him by men,
> No doubt he sits in his dark den
> Instructing others blind as he
> Exactly how the world should be.
>
> -- Oliver Herford
A typical computational pipeline to process single-cell RNA sequencing (scRNA-seq) data involves clustering of cells. Assignment of cell type labels to those clusters is often a time-consuming process that involves manual inspection of the cluster marker genes complemented with a detailed literature search. This is especially challenging if you are not familiar with all the captured subpopulations or have unexpected contaminants. `clustermole` is an R package that provides a comprehensive meta collection of cell identity markers for thousands of human and mouse cell types sourced from a variety of databases as well as methods to query them.

Install clustermole (development version):
![clustermole-book](https://user-images.githubusercontent.com/6363505/72761156-12414280-3ba9-11ea-87de-57ff6cd690bb.png)

A typical computational pipeline to process single-cell RNA sequencing (scRNA-seq) data includes clustering of cells as one of the steps. Assignment of cell type labels to those clusters is often a time-consuming process that involves manual inspection of the cluster marker genes complemented with a detailed literature search. This is especially challenging for those who are not familiar with all the captured subpopulations or have unexpected contaminants. The clustermole R package provides a comprehensive meta collection of cell identity markers for thousands of human and mouse cell types sourced from a variety of databases as well as methods to query them.

The clustermole package provides three primary features:

* cell type prediction based on marker genes
* cell type prediction based on a full expression matrix
* a database of cell type markers

---

Install clustermole from CRAN:

```r
install.packages("clustermole")
```

Alternatively, you can install the development version from GitHub (not recommended):

```r
BiocManager::install("igordot/clustermole", update = FALSE)
Expand All @@ -32,20 +34,24 @@ Load clustermole:
library(clustermole)
```

Retrieve a table of all cell type markers:
Perform cell type overrepresentation analysis for a given set of genes:

```r
clustermole_markers(genes, species)
clustermole_overlaps(genes, species = "hs")
```

Perform cell type overrepresentation analysis for a given set of genes:
Perform cell type enrichment for a given full gene expression matrix:

```r
clustermole_overlaps(expr_mat, species)
clustermole_enrichment(expr_mat, species = "hs")
```

Perform cell type enrichment for a given full gene expression matrix:
Retrieve a table of all cell type markers:

```r
clustermole_enrichment(expr_mat, species)
clustermole_markers(species = "hs")
```

---

*Image credit: "A Child's Primer Of Natural History" by Oliver Herford*
37 changes: 19 additions & 18 deletions vignettes/clustermole-intro.Rmd
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
---
title: "Introduction to clustermole"
output:
rmarkdown::html_vignette:
prettydoc::html_pretty:
keep_md: true
toc: true
theme: hpstr
highlight: github
vignette: >
%\VignetteIndexEntry{Introduction to clustermole}
%\VignetteEngine{knitr::rmarkdown}
Expand All @@ -16,53 +19,51 @@ knitr::opts_chunk$set(
)
```

*Alternative title: blindly digging for cell types in scRNA-seq clusters with clustermole*

## Overview

A typical computational pipeline to process single-cell RNA sequencing (scRNA-seq) data involves clustering of cells. Assignment of cell type labels to those clusters is often a time-consuming process that involves manual inspection of the cluster marker genes complemented with a detailed literature search. This is especially challenging if you are not familiar with all the captured subpopulations or have unexpected contaminants. `clustermole` is an R package that provides a comprehensive meta collection of cell identity markers for thousands of human and mouse cell types sourced from a variety of databases as well as methods to query them.
A typical computational pipeline to process single-cell RNA sequencing (scRNA-seq) data includes clustering of cells as one of the steps. Assignment of cell type labels to those clusters is often a time-consuming process that involves manual inspection of the cluster marker genes complemented with a detailed literature search. This is especially challenging for those who are not familiar with all the captured subpopulations or have unexpected contaminants. The clustermole R package provides a comprehensive meta collection of cell identity markers for thousands of human and mouse cell types sourced from a variety of databases as well as methods to query them.

The `clustermole` package includes three primary features:
The clustermole package provides three primary features:

* cell type identification based on a set of marker genes (`clustermole_overlaps`)
* cell type identification based on a full expression matrix (`clustermole_enrichment`)
* a meta collection of cell type markers (`clustermole_markers`)
* cell type prediction based on marker genes (`clustermole_overlaps`)
* cell type prediction based on a full expression matrix (`clustermole_enrichment`)
* a database of cell type markers (`clustermole_markers`)

## Usage

Install `clustermole` if it is not yet available on your system.
Install clustermole if it is not yet available on your system.

```{r install-package, eval=FALSE}
install.packages("clustermole")
```

Load `clustermole`.
Load clustermole.

```{r load-package, message=FALSE}
library(clustermole)
```

### Overlap a set of genes with cell type markers
### `clustermole_overlaps()`: cell type prediction based on marker genes

If you have a set of genes (for example, cluster markers), you can perform overrepresentation analysis to see if they overlap any of the known cell type markers.
If you have a set of genes, such as cluster markers, you can compare them to known cell type markers to see if they overlap any of the known cell type markers (overrepresentation analysis).

```{r overlaps}
my_genes = c("CD2", "CD3D", "CD3E", "IL7R", "IL32", "LTB", "LDHB", "CCR7")
my_overlaps = clustermole_overlaps(genes = my_genes, species = "hs")
my_overlaps
```

### Determine relative enrichment of cell type markers in the input expression data
### `clustermole_enrichment()`: cell type enrichment in the full expression matrix

If you have a table of expression values (for example, average expression across clusters), you can perform cell type enrichment based on a given gene expression matrix (log-transformed CPM/TPM/FPKM values).
If you have a table of expression values, such as average expression across clusters, you can perform cell type enrichment based on a given gene expression matrix (log-transformed CPM/TPM/FPKM values). Genes are rows and clusters/samples are columns.

```{r enrichment, eval=FALSE}
clustermole_enrichment(expr_mat = my_expr_mat, species = "hs")
```

### Retrieve cell type markers
### `clustermole_markers()`: retrieve cell type markers

You can retrieve a data frame of all cell type markers in the database.
You can use `clustermole` as a simple database and get a data frame of all cell type markers.

```{r markers}
markers = clustermole_markers(species = "hs")
Expand All @@ -79,13 +80,13 @@ markers_list = split(x = markers$gene, f = markers$celltype_full)

## Collection details

We will use `dplyr` to help with summary statistics.
We will load dplyr to help with the summary statistics.

```{r load-dplyr, message=FALSE}
library(dplyr)
```

Retrieve a data frame of all cell type markers in the database.
You can use `clustermole_markers()` to retrieve a data frame of all cell type markers in the database.

```{r get-markers}
markers = clustermole_markers(species = "hs")
Expand Down

0 comments on commit cc31231

Please sign in to comment.