README.Rmd

---
output: github_document
bibliography: references.bib
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# HarmonizomeR

<!-- badges: start -->

<!-- badges: end -->

The goal of `{HarmonizomeR}` is to provide a fast interface to download and perform functional and gene set enrichment analysis from the [Harmonizome](https://maayanlab.cloud/Harmonizome/) database [@rouillard2016].

## Installation

You can install the development version of `{HarmonizomeR}` from [GitHub](https://github.com/) with:

``` r
# install.packages("devtools")
devtools::install_github("marceelrf/HarmonizomeR")
```

## Example

```{r,echo=FALSE}
for (f in list.files(full.names = T,pattern="*.R",path = "R/")) {
    source(f)
}
for (f in list.files(full.names = T,pattern="*.rda",path = "data/")) {
    load(f)
}
```

The function `show_dataset_collection()` is used to check all available datasets.

```{r}
show_dataset_collection()
```

Let's use the [*SILAC Phosphoproteomics Signatures of Differentially Phosphorylated Proteins for Drugs* dataset](https://maayanlab.cloud/Harmonizome/dataset/SILAC+Phosphoproteomics+Signatures+of+Differentially+Phosphorylated+Proteins+for+Drugs) using the code **silacdrug** in the function `get_geneset()`.

**CAUTION**: Depending on the size of your data set and your internet connection this step could take a long time!!!

```{r}
silacdrug_ds <- get_geneset(code = "silacdrug")
```

```{r}
head(silacdrug_ds)
```

## Enrichment analysis

### Over representation analysis (ORA)

To perform the ORA the function `EnrichHarmonizome()` uses the `{clusterProfiler}` package [@clusterProfiler].

```{r}
genes <- c("CYP2D26","NCOA7","CCDC3","SNTG2","LIMK1","PPWD1","2900055J20RIK","GM839","HSPA12A","MTIF3","KDM2B",
           "FAM221A","GM19710","CCDC68","CNRIP1","GM7544","LGI2","CLIP3","GM9484","1700034J05RIK","RIPK2","DPF2","RPS6KA4","RUNX1","DNM1L","SGTA","PIP5K1B","MTA1","KIAA1524","NCOR2",
           "HSP90AB1","ARFIP2","DKC1","KMT2A","RPLP2","PLEC","HSP90AA1","PEAK1","ZDHHC5","TBC1D25")

ORA <- EnrichHarmonizome(gene = genes,
                  tbl = silacdrug_ds,
                  pvalueCutoff = 0.05,
                  pAdjustMethod = "BH",
                  minGSSize = 5,
                  maxGSSize = 5000)

head(ORA@result)
```

### Gene set enrichment analysis (GSEA)

To perform the GSEA analysis is necessary to convert the geneset to a named list. The function `geneset_to_list()` handle with this task.

The function `GSEAHarmonizome()` uses the `{fgsea}` package to compute the GSEA algorithm [@fgsea; @subramanian2005; @mootha2003].

```{r}
pathways_list <- geneset_to_list(tbl = silacdrug_ds)

length(pathways_list[[1]])

pathways_list[1]
```

```{r}
data("example_gsea")
GSEA <- GSEAHarmonizome(pathways = pathways_list,
                        stats = gene_ranks,minSize = 10,maxSize = 500)

head(GSEA)
```

### Gene set variation analysis

`GSVAHarmonizome()` wraps the `{GSVA}` package to perform the GSVA algorithm [@Hänzelmann2013; @GSVA].

```{r}
data("example_gsva")

GSVA <- GSVAHarmonizome(expr = example_expr,pathways = pathways_list)


```

## Funding

The authors thanks [FAPESP](https://fapesp.br/)(n 2018/05731-7) for the funding.

## References