-
Notifications
You must be signed in to change notification settings - Fork 10
/
Copy pathindex.html
280 lines (262 loc) · 13.7 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
<!DOCTYPE html>
<html lang="en">
<head>
<title>DTI Clustering</title>
<meta name="description" content="Project page for Deep Transformation-Invariant Clustering.">
<meta name="viewport" content="width=device-width, initial-scale=1, user-scalable=yes">
<meta charset="utf-8">
<!--Facebook preview-->
<meta property="og:image" content="https://imagine.enpc.fr/~monniert/DTIClustering/thumbnail.png">
<meta property="og:image:type" content="image/png">
<meta property="og:image:width" content="600">
<meta property="og:image:height" content="400">
<meta property="og:type" content="website"/>
<meta property="og:url" content="https://imagine.enpc.fr/~monniert/DTIClustering/"/>
<meta property="og:title" content="DTI Clustering"/>
<meta property="og:description" content="Project page for Deep Transformation-Invariant Clustering."/>
<!--Twitter preview-->
<meta name="twitter:card" content="summary_large_image" />
<meta name="twitter:title" content="DTI Clustering" />
<meta name="twitter:description" content="Project page for Deep Transformation-Invariant Clustering."/>
<meta name="twitter:image" content="https://imagine.enpc.fr/~monniert/DTIClustering/thumbnail_twitter.png">
<!--Style-->
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.4.1/css/bootstrap.min.css">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
<link href="style.css" rel="stylesheet">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.4.1/js/bootstrap.min.js"></script>
</head>
<body>
<div class="container" style="text-align:center; padding:2rem 15px">
<div class="row" style="text-align:center">
<h1>Deep Transformation-Invariant Clustering</h1>
<h4>NeurIPS 2020 (oral presentation)</h4>
</div>
<div class="row" style="text-align:center">
<div class="col-xs-0 col-md-3"></div>
<div class="col-xs-12 col-md-6">
<h4>
<a href="https://imagine.enpc.fr/~monniert/"><nobr>Tom Monnier</nobr></a>  
<a href="https://imagine.enpc.fr/~groueixt/"><nobr>Thibault Groueix</nobr></a>  
<a href="https://imagine.enpc.fr/~aubrym/"><nobr>Mathieu Aubry</nobr></a>
</h4>
LIGM, <nobr>École des Ponts</nobr>, <nobr>Univ Gustave Eiffel</nobr>, CNRS,
<nobr>Marne-la-Vallée, France</nobr>
</div>
<div class="hidden-xs hidden-sm col-md-1" style="text-align:left; margin-left:0px; margin-right:0px">
<a href="https://arxiv.org/pdf/2006.11132.pdf" style="color:inherit">
<i class="fa fa-file-pdf-o fa-4x"></i></a>
</div>
<div class="hidden-xs hidden-sm col-md-2" style="text-align:left; margin-left:0px;">
<a href="https://github.com/monniert/dti-clustering" style="color:inherit">
<i class="fa fa-github fa-4x"></i></a>
</div>
</div>
</div>
<div class="container" style="text-align:center; padding:1rem">
<img src="resrc/teaser.jpg" alt="teaser.jpg" class="text-center" style="width: 100%; max-width: 1100px">
<h3 style="text-align:center; padding-top:1rem">
<a class="label label-info" href="https://arxiv.org/abs/2006.11132">Paper</a>
<a class="label label-info" href="https://github.com/monniert/dti-clustering">Code</a>
<a class="label label-info" href="https://www.youtube.com/embed/j20MBc1hWGQ">Video</a>
<a class="label label-info" href="resrc/dtic_long.pptx">Slides</a>
<a class="label label-info" href="resrc/ref.bib">BibTeX</a>
</h3>
</div>
<div class="container">
<h3>Abstract</h3>
<hr/>
<p>
Recent advances in image clustering typically focus on learning better deep
representations. In contrast, we present an orthogonal approach that does not rely on
abstract features but instead learns to <b>predict image transformations</b> and directly performs
<b>clustering in pixel space</b>. This learning process naturally fits in the
gradient-based training of K-means and Gaussian mixture model, without requiring any
additional loss or hyper-parameters. It leads us to two new deep transformation-invariant
clustering frameworks, which <b>jointly learn prototypes and transformations</b>. More
specifically, we use deep learning modules that enable us to resolve invariance to spatial,
color and morphological transformations. Our approach is conceptually simple and comes with
several advantages, including the possibility to easily adapt the desired invariance to the
task and a <b>strong interpretability</b> of both cluster centers and assignments to clusters. We
demonstrate that our novel approach yields <b>competitive and highly promising results</b> on
standard image clustering benchmarks. Finally, we showcase its robustness and the
advantages of its improved interpretability by visualizing clustering results over real
photograph collections.
</p>
<h3>Video</h3>
<hr/>
<div class="row" style="text-align:center">
<div class="col-xs-6 text-center">
<h4><u>Short presentation</u> (3min)</h4>
<div class="embed-responsive embed-responsive-16by9" style="text-align:center">
<iframe class="embed-responsive-item text-center" src="https://www.youtube.com/embed/j20MBc1hWGQ" frameborder="0"
allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
style="width:100%; clip-path:inset(1px 1px);" allowfullscreen></iframe>
</div>
</div>
<div class="col-xs-6 text-center">
<h4><u>Long presentation</u> (11min)</h4>
<div class="embed-responsive embed-responsive-16by9" style="text-align:center">
<iframe class="embed-responsive-item text-center" src="https://www.youtube.com/embed/xhLUOh5PKBA" frameborder="0"
allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture"
style="width:100%; clip-path:inset(1px 1px);" allowfullscreen></iframe>
</div>
</div>
</div>
<h3>Approach</h3>
<hr/>
<div class="row" style="text-align: center">
<div class="col-xs-6">
<h4 style="margin-right: 20%"><u>DTI framework</u></h4>
</div>
<div class="col-xs-6">
<h4><u>Deep transformation module
<img src="http://latex.codecogs.com/svg.latex?\mathcal{T}_{f_{k}}" alt="T_f_k" border="0"/></u></h4>
</div>
</div>
<div class="row" style="text-align: center">
<div class="col-xs-6">
<img src="resrc/dti.png" alt="dti.png" class="text-center" style="width: 100%; max-width: 900px">
</div>
<div class="col-xs-6">
<img src="resrc/deep_tsf.png" alt="deep_tsf.png" class="text-center" style="width: 90%; max-width: 900px; margin-top:
10px">
</div>
</div>
<div class="row" style="text-align: center">
<div class="col-xs-6">
<div style="width: 90%; max-width: 900px; padding-top:10px">
<p>Given a sample <img src="http://latex.codecogs.com/svg.latex?x_i" alt="x_i" border="0"/> and prototypes
<img src="http://latex.codecogs.com/svg.latex?c_1" alt="c_1" border="0"/> and
<img src="http://latex.codecogs.com/svg.latex?c_2" alt="c_2" border="0"/>, standard clustering such as K-means
assigns the sample to the closest prototype. Our DTI clustering first aligns prototypes to the sample using a
family of parametric transformations - here rotations - then picks the prototype whose alignment yields the
smallest distance.</p>
</div>
</div>
<div class="col-xs-6">
<div style="width: 100%; max-width: 900px; padding-top:10px">
<p>We predict alignment with deep learning. Given an image
<img src="http://latex.codecogs.com/svg.latex?x_i" alt="x_i" border="0"/>, each deep parameter predictor
<img src="http://latex.codecogs.com/svg.latex?f_k" alt="f_k" border="0"/> predicts
parameters for a sequence of transformations - here affine, morphological and thin plate spline transformations -
to align the prototype <img src="http://latex.codecogs.com/svg.latex?c_k" alt="c_k" border="0"/>
to the query image <img src="http://latex.codecogs.com/svg.latex?x_i" alt="x_i" border="0"/>.</p>
</div>
</div>
</div>
<h3>Results</h3>
<hr/>
<div class="row" style="text-align: center; padding-left:1rem; padding-right:1rem; padding-bottom:1rem;">
<h4><u>Standard image clustering benchmarks</u></h4>
<img src="resrc/prototypes.jpg" alt="prototypes.jpg" class="text-center" style="width: 100%; max-width: 1000px;">
</div>
<div class="row" style="text-align:center; padding-left:1rem; padding-right:1rem; padding-bottom:1rem;">
<h4><u>MegaDepth locations</u></h4>
<img src="resrc/megadepth.jpg" alt="megadepth.jpg" class="text-center" style="width: 100%; max-width: 1000px;
margin-top: 5px">
</div>
<div class="row" style="text-align:center; padding-left:1rem; padding-right:1rem; padding-bottom:1rem;">
<h4><u>MegaDepth Florence: detailed results</u></h4>
<p>We show the 6 best qualitatives prototypes learned using DTI clustering
with 20 clusters for Florence location in MegaDepth dataset. For each cluster, we show the 20 samples leading to
minimal reconstruction errors among all the samples in the cluster as well as corresponding transformed
prototypes. Note how it manages to model real image transformations like illumination variations and viewpoint
changes.</p>
<img src="resrc/firenze.jpg" alt="firenze.jpg" class="text-center" style="width: 100%; max-width: 1100px">
</div>
<div class="row" style="text-align:center; padding-left:1rem; padding-right:1rem;padding-bottom:1rem;">
<h4><u>Instagram hashtags</u></h4>
<p>We show the 5 best qualitatives prototypes learned using DTI clustering
with 40 clusters for different Instagram photo collections. Each collection corresponds to a large unfiltered set
of Instagram images (from 10k to 15k) associated to a particular hashtag. Identifying visual trends or iconic
poses in this case is very challenging as most of the images are noise. You can visualize the type of collected
images directly in Instagram:
<a href=https://www.instagram.com/explore/tags/balitemple/>#balitemple</a>,
<a href=https://www.instagram.com/explore/tags/santaphoto/>#santaphoto</a>,
<a href=https://www.instagram.com/explore/tags/trevifountain/>#trevifountain</a>,
<a href=https://www.instagram.com/explore/tags/weddingkiss/>#weddingkiss</a>,
<a href=https://www.instagram.com/explore/tags/yogahandstand/>#yogahandstand</a>.</p>
<img src="resrc/instagram.jpg" alt="instagram.jpg" class="text-center"
style="width:100%;max-width:850px;margin-top:1rem;padding-right:1.5rem;">
</div>
<h3>Resources</h3>
<hr/>
<div class="row" style="text-align: center">
<div class="col-xs-0 col-lg-0"></div>
<div class="col-xs-4 col-lg-4">
<h4>Paper</h4>
<a href="https://arxiv.org/abs/2006.11132" style="color:inherit">
<img src="resrc/paper.jpg" alt="paper.jpg" class="text-center" style="max-width:70%; border:0.15em solid;
border-radius:0.5em;"></a>
</div>
<div class="col-xs-4 col-lg-4">
<h4>Code</h4>
<a href="https://github.com/monniert/dti-clustering" style="color:inherit;">
<img src="resrc/github_repo.png" alt="github_repo.png" class="text-center"
style="max-width:70%; border:0.15em solid;border-radius:0.5em;"></a>
</div>
<div class="col-xs-4 col-lg-4">
<h4>Slides</h4>
<a href="dtic_long.pptx" style="color:inherit;">
<img src="resrc/slides.png" alt="slides.png" class="text-center"
style="max-width:70%; border:0.15em solid;border-radius:0.5em;"></a>
</div>
<div class="col-xs-0 col-lg-0"></div>
</div>
<h4 style="padding-top:0.5em">BibTeX</h4>
If you find this work useful for your research, please cite:
<div class="card">
<div class="card-block">
<pre class="card-text clickselect">
@inproceedings{monnier2020dticlustering,
title={{Deep Transformation-Invariant Clustering}},
author={Monnier, Tom and Groueix, Thibault and Aubry, Mathieu},
booktitle={NeurIPS},
year={2020},
}</pre>
</div>
</div>
<h3>Further information</h3>
<hr/>
If you like this project, please check out other related works from our group:
<h4>Follow-ups</h4>
<ul>
<li>
<a href="https://arxiv.org/abs/2104.14575">Monnier et al. - Unsupervised Layered Image Decomposition into Object
Prototypes (arXiv 2021)</a>
</li>
</ul>
<h4>Previous works on deep transformations</h4>
<ul>
<li>
<a href="https://arxiv.org/abs/1908.04725">Deprelle et al. - Learning elementary structures for 3D shape
generation and matching (NeurIPS 2019)</a>
</li>
<li>
<a href="https://arxiv.org/abs/1806.05228">Groueix et al. - 3D-CODED: 3D Correspondences by Deep Deformation (ECCV
2018)</a>
</li>
<li>
<a href="https://arxiv.org/abs/1802.05384">Groueix et al. - AtlasNet: A Papier-Mache Approach to Learning 3D
Surface Generation (CVPR 2018)</a>
</li>
</ul>
<h3>Acknowledgements</h3>
<hr/>
<p>
This work was supported in part by <a href="https://enherit.enpc.fr/">ANR project EnHerit</a> ANR-17-CE23-0008,
project Rapid Tabasco, gifts from Adobe and HPC resources from GENCI-IDRIS (Grant 2020-AD011011697). We thank
Bryan Russell, Vladimir Kim, Matthew Fisher, François Darmon, Simon Roburin, David Picard, Michael
Ramamonjisoa, Vincent Lepetit, Elliot Vincent, Jean Ponce, William Peebles and Alexei Efros for inspiring
discussions and valuable feedback.
</p>
</div>
<div class="container" style="padding-top:3rem; padding-bottom:3rem">
<p style="text-align:center">
© This webpage was in part inspired from this
<a href="https://github.com/monniert/project-webpage">template</a>.
</p>
</div>
</body>
</html>