-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
122 lines (82 loc) · 3.3 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
---
output: github_document
bibliography: references.bib
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# HarmonizomeR
<!-- badges: start -->
<!-- badges: end -->
The goal of `{HarmonizomeR}` is to provide a fast interface to download and perform functional and gene set enrichment analysis from the [Harmonizome](https://maayanlab.cloud/Harmonizome/) database [@rouillard2016].
## Installation
You can install the development version of `{HarmonizomeR}` from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("marceelrf/HarmonizomeR")
```
## Example
```{r,echo=FALSE}
for (f in list.files(full.names = T,pattern="*.R",path = "R/")) {
source(f)
}
for (f in list.files(full.names = T,pattern="*.rda",path = "data/")) {
load(f)
}
```
The function `show_dataset_collection()` is used to check all available datasets.
```{r}
show_dataset_collection()
```
Let's use the [*SILAC Phosphoproteomics Signatures of Differentially Phosphorylated Proteins for Drugs* dataset](https://maayanlab.cloud/Harmonizome/dataset/SILAC+Phosphoproteomics+Signatures+of+Differentially+Phosphorylated+Proteins+for+Drugs) using the code **silacdrug** in the function `get_geneset()`.
**CAUTION**: Depending on the size of your data set and your internet connection this step could take a long time!!!
```{r}
silacdrug_ds <- get_geneset(code = "silacdrug")
```
```{r}
head(silacdrug_ds)
```
## Enrichment analysis
### Over representation analysis (ORA)
To perform the ORA the function `EnrichHarmonizome()` uses the `{clusterProfiler}` package [@clusterProfiler].
```{r}
genes <- c("CYP2D26","NCOA7","CCDC3","SNTG2","LIMK1","PPWD1","2900055J20RIK","GM839","HSPA12A","MTIF3","KDM2B",
"FAM221A","GM19710","CCDC68","CNRIP1","GM7544","LGI2","CLIP3","GM9484","1700034J05RIK","RIPK2","DPF2","RPS6KA4","RUNX1","DNM1L","SGTA","PIP5K1B","MTA1","KIAA1524","NCOR2",
"HSP90AB1","ARFIP2","DKC1","KMT2A","RPLP2","PLEC","HSP90AA1","PEAK1","ZDHHC5","TBC1D25")
ORA <- EnrichHarmonizome(gene = genes,
tbl = silacdrug_ds,
pvalueCutoff = 0.05,
pAdjustMethod = "BH",
minGSSize = 5,
maxGSSize = 5000)
head(ORA@result)
```
### Gene set enrichment analysis (GSEA)
To perform the GSEA analysis is necessary to convert the geneset to a named list. The function `geneset_to_list()` handle with this task.
The function `GSEAHarmonizome()` uses the `{fgsea}` package to compute the GSEA algorithm [@fgsea; @subramanian2005; @mootha2003].
```{r}
pathways_list <- geneset_to_list(tbl = silacdrug_ds)
length(pathways_list[[1]])
pathways_list[1]
```
```{r}
data("example_gsea")
GSEA <- GSEAHarmonizome(pathways = pathways_list,
stats = gene_ranks,minSize = 10,maxSize = 500)
head(GSEA)
```
### Gene set variation analysis
`GSVAHarmonizome()` wraps the `{GSVA}` package to perform the GSVA algorithm [@Hänzelmann2013; @GSVA].
```{r}
data("example_gsva")
GSVA <- GSVAHarmonizome(expr = example_expr,pathways = pathways_list)
```
## Funding
The authors thanks [FAPESP](https://fapesp.br/)(n 2018/05731-7) for the funding.
## References