Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
igordot committed Jan 26, 2020
1 parent cc31231 commit 6f772b2
Show file tree
Hide file tree
Showing 6 changed files with 35 additions and 9 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Authors@R:
family = "Dolgalev",
role = c("aut", "cre"),
email = "igor.dolgalev@nyumc.org")
Description: A typical computational pipeline to process single-cell RNA sequencing (scRNA-seq) data involves clustering of cells. Assignment of cell type labels to those clusters is often a time-consuming process that involves manual inspection of the cluster marker genes complemented with a detailed literature search. This is especially challenging if you are not familiar with all the captured subpopulations or have unexpected contaminants. 'clustermole' provides a comprehensive meta collection of cell identity markers for thousands of human and mouse cell types sourced from a variety of databases as well as methods to query them.
Description: Assignment of cell type labels to single-cell RNA sequencing (scRNA-seq) clusters is often a time-consuming process that involves manual inspection of the cluster marker genes complemented with a detailed literature search. This is especially challenging when unexpected or poorly described populations are present. The clustermole R package provides methods to query thousands of human and mouse cell identity markers sourced from a variety of databases.
License: MIT + file LICENSE
URL: https://github.com/igordot/clustermole
BugReports: https://github.com/igordot/clustermole/issues
Expand Down
8 changes: 8 additions & 0 deletions R/functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
#' @param species Species for the appropriate gene symbol format: "hs" for human or "mm" for mouse.
#'
#' @return A data frame of markers with one gene per row.
#'
#' @import dplyr
#' @export
#'
Expand Down Expand Up @@ -31,6 +32,7 @@ clustermole_markers <- function(species = "hs") {
#' @param species Species: "hs" for human or "mm" for mouse.
#'
#' @return A data frame of enrichment results with hypergeometric test p-values.
#'
#' @import methods
#' @import dplyr
#' @importFrom tibble as_tibble
Expand Down Expand Up @@ -87,6 +89,7 @@ clustermole_overlaps <- function(genes, species) {

# clean up the enrichment table
overlaps_tbl <- tibble::as_tibble(overlaps_mat, rownames = "celltype_full")
overlaps_tbl <- dplyr::filter(overlaps_tbl, .data$p_value < 0.05)
overlaps_tbl <- dplyr::inner_join(celltypes_tbl, overlaps_tbl, by = "celltype_full")
overlaps_tbl <- dplyr::arrange(overlaps_tbl, .data$fdr, .data$p_value, .data$celltype_full)
overlaps_tbl
Expand All @@ -98,12 +101,16 @@ clustermole_overlaps <- function(genes, species) {
#' @param species Species: "hs" for human or "mm" for mouse.
#'
#' @return A data frame of enrichment results.
#'
#' @import methods
#' @import dplyr
#' @importFrom tibble as_tibble
#' @importFrom tidyr gather
#' @importFrom GSVA gsva
#' @export
#'
#' @examples
#' # my_enrichment <- clustermole_enrichment(expr_mat = my_expr_mat, species = "hs")
clustermole_enrichment <- function(expr_mat, species) {

# check that the expression matrix seems reasonable
Expand Down Expand Up @@ -156,6 +163,7 @@ clustermole_enrichment <- function(expr_mat, species) {
#' @param gene_label Column name for genes (variable columns of the GMT file) in the output data frame.
#'
#' @return A data frame with gene sets as the first column and genes as the second column (one gene per row).
#'
#' @import utils
#' @importFrom tibble enframe
#' @importFrom tidyr unnest
Expand Down
8 changes: 6 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,19 @@

![clustermole-book](https://user-images.githubusercontent.com/6363505/72761156-12414280-3ba9-11ea-87de-57ff6cd690bb.png)

A typical computational pipeline to process single-cell RNA sequencing (scRNA-seq) data includes clustering of cells as one of the steps. Assignment of cell type labels to those clusters is often a time-consuming process that involves manual inspection of the cluster marker genes complemented with a detailed literature search. This is especially challenging for those who are not familiar with all the captured subpopulations or have unexpected contaminants. The clustermole R package provides a comprehensive meta collection of cell identity markers for thousands of human and mouse cell types sourced from a variety of databases as well as methods to query them.
## About

Assignment of cell type labels to single-cell RNA sequencing (scRNA-seq) clusters is often a time-consuming process that involves manual inspection of the cluster marker genes complemented with a detailed literature search. This is especially challenging when unexpected or poorly described populations are present. The clustermole R package provides methods to query thousands of human and mouse cell identity markers sourced from a variety of databases.

The clustermole package provides three primary features:

* cell type prediction based on marker genes
* cell type prediction based on a full expression matrix
* a database of cell type markers

---
## Usage

A [vignette](https://CRAN.R-project.org/package=clustermole/vignettes/clustermole-intro.html) is available with usage examples.

Install clustermole from CRAN:

Expand Down
3 changes: 3 additions & 0 deletions man/clustermole_enrichment.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions tests/testthat/test-functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ test_that("clustermole_overlaps() wrong input", {
test_that("clustermole_overlaps() human input", {
overlap_tbl <- clustermole_overlaps(genes = gene_names[1:50], species = "hs")
expect_s3_class(overlap_tbl, "tbl_df")
expect_gt(nrow(overlap_tbl), 100)
expect_gt(nrow(overlap_tbl), 1)
})

# gene list for mouse overrepresentation tests
Expand All @@ -52,7 +52,7 @@ gene_names <- sample(gene_names)
test_that("clustermole_overlaps() mouse input", {
overlap_tbl <- clustermole_overlaps(genes = gene_names[1:50], species = "mm")
expect_s3_class(overlap_tbl, "tbl_df")
expect_gt(nrow(overlap_tbl), 100)
expect_gt(nrow(overlap_tbl), 1)
})

# clustermole_enrichment -----
Expand Down
19 changes: 15 additions & 4 deletions vignettes/clustermole-intro.Rmd
Original file line number Diff line number Diff line change
@@ -1,27 +1,38 @@
---
title: "Introduction to clustermole"
subtitle: "blindly digging for cell types in scRNA-seq clusters"
output:
prettydoc::html_pretty:
keep_md: true
toc: true
theme: hpstr
highlight: github
mathjax: null
self_contained: true
vignette: >
%\VignetteIndexEntry{Introduction to clustermole}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
```{r, include=FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```

```{css, echo=FALSE}
.header-title h2 {
text-transform: none;
font-style: italic;
font-size: 1.2rem;
}
```

## Overview

A typical computational pipeline to process single-cell RNA sequencing (scRNA-seq) data includes clustering of cells as one of the steps. Assignment of cell type labels to those clusters is often a time-consuming process that involves manual inspection of the cluster marker genes complemented with a detailed literature search. This is especially challenging for those who are not familiar with all the captured subpopulations or have unexpected contaminants. The clustermole R package provides a comprehensive meta collection of cell identity markers for thousands of human and mouse cell types sourced from a variety of databases as well as methods to query them.
A typical computational pipeline to process single-cell RNA sequencing (scRNA-seq) data includes clustering of cells as one of the steps. Assignment of cell type labels to those clusters is often a time-consuming process that involves manual inspection of the cluster marker genes complemented with a detailed literature search. This is especially challenging when unexpected or poorly described populations are present. The clustermole R package provides methods to query thousands of human and mouse cell identity markers sourced from a variety of databases.

The clustermole package provides three primary features:

Expand Down Expand Up @@ -78,15 +89,15 @@ If you need to convert the markers from a data frame to a list format for other
markers_list = split(x = markers$gene, f = markers$celltype_full)
```

## Collection details
## Database details

We will load dplyr to help with the summary statistics.

```{r load-dplyr, message=FALSE}
library(dplyr)
```

You can use `clustermole_markers()` to retrieve a data frame of all cell type markers in the database.
You can use `clustermole_markers()` to retrieve a data frame of all cell type markers in the collection.

```{r get-markers}
markers = clustermole_markers(species = "hs")
Expand Down

0 comments on commit 6f772b2

Please sign in to comment.