-
Notifications
You must be signed in to change notification settings - Fork 8
/
Copy pathhicClust.Rmd
127 lines (95 loc) · 3.92 KB
/
hicClust.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
---
title: "Clustering of Hi-C contact maps"
author: "Shubham Chaturvedi, Pierre Neuvial, Nathalie Vialaneix"
date: "`r Sys.Date()`"
output:
html_document:
toc: yes
vignette: >
%\VignetteIndexEntry{Clustering of Hi-C contact maps}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r skipNoHITC}
# IMPORTANT: this vignette can not be created if HiTC is not installed
if (!require("HiTC", quietly = TRUE)) {
knitr::opts_chunk$set(eval = FALSE)
}
```
# Introduction
Hi-C is a sequencing-based molecular assay designed to measure intra and
inter-chromosomal interactions between the DNA molecule. In particular, the
identification of Topologically-Associated Domains (TADs), that is, of regions
of the genome in which physical interactions are frequent, provides insight
into the three-dimensional organization of a genome [2].
Hi-C data are in the form of two-dimensional *contact maps*, *i.e.*, matrices
whose $i,j$ entry quantifies the intensity of the physical interaction between
two genome regions $i$ and $j$ at the DNA level. In this vignette, we
demonstrate the use of `adjclust::hicClust` to perform adjacency-constrained
hierarchical agglomerative clustering (HAC) of Hi-C contact maps. The output of
this function is a dendrogram, which can be cut to identify TADs. The algorithm
used for adjacency-constrained (HAC) is described in [3,4].
```{r loadLib, message=FALSE}
library("adjclust")
```
# Loading and displaying a sample Hi-C contact map
The data set `hic_imr90_40_XX` is an object of class `HTCexp` which has been
obtained from the `HiTC` package [4]. It is a contact map corresponding to the
first 500 x 500 bins on chromosome X vs chromosome X.
```{r loadData}
load(system.file("extdata", "hic_imr90_40_XX.rda", package = "adjclust"))
```
The script used to create this map can be found by executing the following command:
```{r create-data-script}
system.file("system/create_hic_chrXchrX.R", package="adjclust")
```
Now we have a look at the data.
```{r mapHiC, message=FALSE}
HiTC::mapC(hic_imr90_40_XX)
```
# Using `hicClust`
`hicClust` operates directly on objects of class `HTCexp`
```{r hicClust-HTCexp, message=FALSE}
fit <- hicClust(hic_imr90_40_XX)
```
It is also possible to work on binned data. Below we choose a bin size of
$5 \times 10^5$:
```{r binning, message=FALSE}
binned <- HiTC::binningC(hic_imr90_40_XX, binsize = 1e5)
fitB <- hicClust(binned)
fitB
```
```{r plotBinned, message=FALSE}
HiTC::mapC(binned)
```
The output is of class `chac`. In particular, it can be plotted as a dendrogram
silently relying on the function `plot.dendrogram`:
```{r dendro}
plot(fitB, mode = "corrected")
```
Moreover, the output contains an element named `merge` which describes the
successive merges of the clustering, and an element `gains` which gives the
improvement in the criterion optimized by the clustering at each successive
merge.
```{r objectDesc}
head(cbind(fitB$merge, fitB$gains))
```
# Other types of input
Contacts maps can also be stored as objects of class `Matrix::dsCMatrix`, or as
plain text files. These types of input are also accepted as first argument to
`hicClust`.
# References
[1] Ambroise C., Dehman A., Neuvial P., Rigaill G., and Vialaneix N. (2019).
Adjacency-constrained hierarchical clustering of a band similarity matrix with
application to genomics. *Algorithms for Molecular Biology*, **14**, 22.
[2] Dixon J.R., *et al* (2012). Topological domains in mammalian genomes
identified by analysis of chromatin interactions. *Nature*, **485**(7398), 376.
[3] Randriamihamison N., Vialaneix N., and Neuvial P. (2021). Applicability and
interpretability of Ward's hierarchical agglomerative clustering with or without
contiguity constraints. *Journal of Classification*, **38**, 363–389.
[4] Servant N., *et al* (2012). HiTC: Exploration of High-Throughput 'C'
experiments. *Bioinformatics*, **28**(21), 2843-2844.
# Session information
```{r session}
sessionInfo()
```