-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathpresentazione.Rmd
403 lines (285 loc) · 22.5 KB
/
presentazione.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
---
title: "Bayesian Model-Based Clustering for Community Detection"
author: "Alessandro Mirone - 966880"
date: "28/4/2022"
output: ioslides_presentation
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
```
```{r library, message=FALSE, warning=FALSE, include=FALSE}
setwd("E:/DSE flash/Bayesian Analysis/Project/presentation")
library(tidyverse)
library(rtweet)
library(plyr)
library(Gifi)
library(rgl)
library(plotly)
library(knitr)
library(mclust)
library(compositions)
```
## Introduction (I)
### *Social analysis framework:*
The aim of this work is to examine digital interactions of individuals in the form of tweets and use the information retained through such analysis in order to verify whether it's possible to make conclusions about one's socio-political stance.
### *Scope of the project:*
The question is thus addressed as a search for relevant indicators of socio-political stance, a categorization based on such indicators and finally a prediction of one's most probable beliefs about social and political phenomena given its membership to a certain social group.
## Introduction (II)
### *Organization:*
For doing so, the project is divided into three parts:
- Data retrieval and data set construction.
- Variables scaling.
- Clustering of observations.
## Methodology
- Qualitative analysis for finding suitable indicators of social beliefs in the form of digital practices (Tweets).
- Coding of such indicators onto ordinal variables.
- Transformation of ordinal variables into numerical variables through Non-linear Principal Component Analysis with Optimal Scaling.
- Use of Model-Based Clustering (Gaussian finite mixture model) with Bayesian regularization to aggregate newly formed coordinates and find groups of individuals characterized by similar beliefs, infer groups' Habits and analyze the structure of each identified community.
## Qualitative Analysis of Digital Practices (I)
Observing individuals actions in a digital space we can make conclusions about their beliefs and behaviour. In particular, we can look at interactions on social media ("Posting"). This analysis focus on Twitter; we can use Hashtags, feature of this media, to collect individuals' stances about relevant topics.
## Qualitative Analysis of Digital Practices (II)
To collect individual beliefs, first it's necessary to define some dimensions, which will be later coded into variables. The dimensions chosen for this project are six:
- Racial
- Activism
- Partisan
- Political
- Civil
- Party
## Qualitative Analysis of Digital Practices (III)
For Each Axis, two contrasting Hashtags were selected based on a search for influential tweets among US' Conservatives and Liberals. The Hashtags, for each corresponding dimension, are:
- "BlackLivesMatter" Vs "AllLivesMatter"
- "RepublicansAreTheProblem" Vs "DemocratsAreADisaster"
- "VaccinesWork" Vs "Vaxxed"
- "WeWantVotingRights" Vs "VoteThemAllOut2022"
- "ProChoice" Vs "ProLife"
- "VoteBlue" Vs "VoteRed"
## Data set Construction (I)
Using R statistical software and Package RTweet, Tweets from 15927 users were downloaded from Twitter: Tweets were filtered as to contain either one or another hashtag for each of the six axis, excluding ambiguous matches. The results were combined into a single dataframe, allowing to group observations for each axis by user_id. Variables were initially coded as ordinals:
- valued 1 for tweets expressing progressive hashtags.
- valued -1 for tweets expressing conservatives hashtags.
- valued 0 if the user tweeting did not expressed a preference on the corresponding issue.
## Original Data Set
```{r Table 1, echo=FALSE, message=FALSE, warning=FALSE}
data <- read.csv("twitter.data.csv") #original data
data<-data[,-1]
colnames(data) <- c("User","Racial","Activism","Partisan","Political","Civil","Party")
kable(head(data, 5), caption = "Table 1: first five rows of the initial data set")
```
## Data set Construction (II)
The resulting Data set was loaded and rescaled through Non-Linear PCA with Gifi Package and princals() in order to transform the ordinal variables into numerical ones, that can be used for clustering.
```{r table 2: new data coordinates, echo=FALSE, message=FALSE, warning=FALSE}
data <- read.csv("twitter.data.csv")
data<-data[,-c(1,2)]
colnames(data) <- c("Racial","Activism","Partisan","Political","Civil","Party")
fit.data <- princals(data, ndim = 6)
new.tw <- as.data.frame(fit.data$objectscores) # new coordinates on the principal components
colnames(new.tw) <- c("Racial","Activism","Partisan","Political","Civil","Party")
kable(head(new.tw, 5), caption = "Table 2: first five rows of the reconstructed data set")
```
## Summary of princals()
```{r summary, echo=FALSE, message=FALSE, warning=FALSE}
summary(fit.data)
```
## Loadings
```{r Components correlation, echo=FALSE, message=FALSE, warning=FALSE}
kable(fit.data$loadings, caption = "Table 3: loadings") #loadings
```
## Plot of First Three Components
```{r 1_3 axis, echo=FALSE, message=FALSE, warning=FALSE}
fig1 <- plot_ly(new.tw, x = ~Racial, y = ~Activism, z = ~Partisan)
fig1 <- fig1 %>% add_markers()
fig1 <- fig1 %>% layout(scene = list(xaxis = list(title = 'Racial'),
yaxis = list(title = 'Activism'),
zaxis = list(title = 'Partisan')))
fig1
```
## Plot of Second Three Components
```{r 4_6 axis, echo=FALSE, message=FALSE, warning=FALSE}
fig2 <- plot_ly(new.tw, x = ~Political, y = ~Civil, z = ~Party)
fig2 <- fig2 %>% add_markers()
fig2 <- fig2 %>% layout(scene = list(xaxis = list(title = 'Political'),
yaxis = list(title = 'Civil'),
zaxis = list(title = 'Party')))
fig2
```
## Some Considerations about the Results
The initial plots of the reconstructed observations suggest the presence of at least three macro groups. Given that the axis are constructed to be almost orthogonal, we retain all dimensions during the variable transformation. Loadings are useful to map the reconstructed coordinates into the new space, giving the sense of direction; for example, a point which identify an individual of liberal beliefs about minority rights will be in the top left corner of the plot, near the origin of the z axis and of the y axis, corresponding to positive values of the first three components.
## Gaussian Mixture Model
In model-based clustering, the data $y = (y_1, ... ,y_n)$ are assumed to be generated by a mixture model with density
$$
\begin{equation}
f(y) =\prod_{i=1}^{n}\sum_{k=1}^{G} τ_kf_k(y_i|θ_k)
\end{equation}
$$
where $f_k(y_i|θ_k)$ is a probability distribution with parameters $\theta_k$, and $\tau_k$ is the probability of belonging to the $k$-th component. In the multivariate Gaussian mixture model, $f_k$ are taken to be multivariate normal distributions, parameterized by their means $\mu_k$ and covariances $\Sigma_k$:
$$
\begin{equation}
f_k(y_i|\theta_k) = \phi(y_i|\mu_k,\Sigma_k) \equiv |2\pi\Sigma_k|^{-\frac{1}{2}} exp \{\frac{1}{2}(y_i - \mu_k)^T \Sigma_k^{-1} (y_i - \mu_k)\}
\end{equation}
$$
where $\theta_k = (\mu_k , \Sigma_k)$.
## EM algorithm for Multivariate Gaussian Mixtures (I)
The parameters of the model for each component are usually estimated through the Expectation-Maximization Algorithm (EM), based on MLE solution, to find the estimates $\hat{\theta_k}$. This is a general approach to maximum likelihood for problems in which the data can be viewed as consisting of *n* multivariate observations $(y_i, z_i)$, in which $y_i$ is observed and $z_i$ is unobserved. If the $(y_i, z_i)$ are independent and identically distributed (iid) according to a probability distribution $f$ with parameters $\theta$, then the *complete-data likelihood* is
$$
\begin{equation}
\mathcal{L_C}(y,z| \theta) = \prod_{i=1}^{n}f(y_i , z_i | \theta) \
\end{equation}
$$ where $y = (y_1, ... ,y_n)$ and $z = (z_1, ... ,z_n)$.
## EM algorithm for Multivariate Gaussian Mixtures (II)
The *observed data likelihood* $\mathcal{L_O}(y|\theta)$, also called *mixture likelihood*, can be obtained by integrating the unobserved data $z$ out of the complete-data likelihood: $$
\begin{equation}
\begin{split}
\mathcal{L_O}(y,z| \theta) & = \int \mathcal{L_C}(y,z|\theta) dz \\
& = \prod_{i=1}^{n}\sum_{k=1}^{G} τ_k\phi_k(y_i|\mu_k,\Sigma_k)
\end{split}
\end{equation}
$$
## EM algorithm for Multivariate Gaussian Mixtures (III)
The vector $z = (z_1, ... ,z_n)$, where $z_i \in \{1,... ,G\}$, represents the cluster membership for observation $i$. It is possible to use Bayes theorem to estimate the conditional probabilities that $Z_i = k | y_i$ $$
\begin{equation}
P(Z_i = k | Y_i) = \frac{\tau_k N(y|\mu_k, \Sigma_k)}{\sum_{j=1}^{G} \tau_j N(y|\mu_j, \Sigma_j)}
\end{equation}
$$ The MLE solution for the maximization of the complete-data log-likelihood would yield the best parameters estimate $\hat{\theta_k}$ for the model. However this implies estimating $Z_i$, which is a function of $\theta_k$, while $\theta_k$ depends on the values of $Z_i$.
## EM algorithm for Multivariate Gaussian Mixtures (IV)
The EM algorithm solves this problem by recursively estimating both the conditional probabilities $P(Z_i = k | Y_i)$ and the parameters $\theta_k$ in two steps: the first is the Expectation step (E-step) in which, given an initial set of parameters ${\theta{_k}^{(0)}}$, the value $\hat{z}_{i,k}$ of $z_{i,k}$ maximizing the complete-data likelihood is the estimated conditional probability that observation $i$ belongs to group $k$: $$
\begin{equation}
\hat{z}_{i,k}^{(s)} = \frac{\hat{\tau}_k^{(s-1)} f_k(y_i|\hat{\theta}_k^{(s-1)})}{\sum_{j=1}^{G} \hat{\tau}_j^{(s-1)} f_j(y_i|\hat{\theta}_j^{(s-1)})}
\end{equation}
$$ where subscript $(s)$ stands for the $s$-th iteration of the algorithm for mixture models and subscript $(s − 1)$ for the previous one.
## EM algorithm for Multivariate Gaussian Mixtures (V)
The second is the Maximization step (M-step), that involves maximizing the complete-data likelihood in terms of $\tau_k$ and $\theta_k$ with $z_{i,k}$ fixed at the values computed in the E-step, namely $\hat{z}_{i,k}$. At the start of each iteration $(s)$, the observed log-likelihood is evaluated by replacing $\mu_k$, $\Sigma_k$ and $\tau_k$ with $\hat{\mu}_k^{(s-1)}$, $\hat{\Sigma}_k^{(s-1)}$ and $\hat{\tau}_k^{(s-1)}$. At the end of the iteration $(s)$, the values for $\hat{\theta}_k$ in the observed log-likelihood are evaluated again replacing the estimates at $(s-1)$ with those of the current iteration. The algorithm stops if $\ell^{(s)}(\theta|Y_1,...,Y_n) - \ell^{(s-1)}(\theta|Y_1,...,Y_n) < \epsilon$, or if the maximum number of iterations is achieved.
## Drawbacks of the EM algorithm
The EM algorithm is widely used in model based clustering with good results, but can fail to converge, instead diverging to a point of infinite likelihood. This is because, as $\mu_k → y_i$ and $|\Sigma_k| → 0$ for any observation $i$ and mixture component $k$, i.e. as the component mean approaches the observation and the component covariance becomes singular, then the likelihood for that observation becomes infinite and hence so does the whole mixture likelihood
## Bayesian Regularization (I)
The procedure involves placing a prior distribution on the parameters that eliminates failure due to singularity, while having little effect on stable results obtainable without a prior. The Bayesian predictive density for the data is assumed to be of the form
$$
\begin{equation}
\mathcal{L}(Y|\tau_k,\mu_k,\Sigma_k) \mathcal{P}(\tau_k,\mu_k,\Sigma_k|\xi)
\end{equation}
$$
where $\mathcal{L}$ is the mixture likelihood and $\mathcal{P}$ is a prior distribution on the parameters $\tau_k$, $\mu_k$ and $\Sigma_k$, which includes other parameters denoted by $\xi$. The objective is to find the MAP estimate for the mixture parameters.
## Bayesian Regularization (II)
Regarding the choice of priors for $\theta_k$ , it is assumed that:
- the mixture probabilities $\tau_k$ are uniformly distributed on the G-simplex.
- each vector mean $\mu_k$ is normally distributed (conditional on the covariance matrix)
$$
\begin{equation}
\mathcal{P}(\mu_k|\Sigma_k) \sim \mathcal{N}(\mu_p,\Sigma_p/\kappa_p) \propto |\Sigma|^{-\frac{1}{2}}\exp\{-\frac{\kappa_p}{2} trace [(\mu_k - \mu_p)^T \Sigma^{-1} (\mu_k - \mu_p)] \}
\end{equation}
$$
## Bayesian Regularization (III)
- the prior distribution for each covariance matrix $\Sigma_k$ is an Inverse-Wishart
$$
\begin{equation}
\mathcal{P}(\Sigma_k) \sim inverseWishart(\nu_p,\Lambda_p) \propto |\Sigma_k|^{-\frac{\nu_p + d + 1}{2}}\exp\{-\frac{1}{2} trace [\Sigma_k^{-1} \Lambda_p)\}
\end{equation}
$$
where $d$ is the number of dimensions and the subscript $p$ indicates a prior hyperparameter. These are the *mean*, *shrinkage* and *degrees of freedom*, respectively $\mu_p$ , $\kappa_p$ and $\nu_p$ while the hyperparameter $\Lambda_p$ is the *scale* matrix of the inverse-Wishart prior.
## Bayesian Regularization (IV)
The joint prior is a normal-inverse-Wishart
$$
\begin{equation}
\begin{split}
\mathcal{P}(\theta|\xi) & \sim Normal-inverseWishart(\mu_p,\kappa_p,\nu_p,\Lambda_p) \\
&\propto |\Sigma|^{-\frac{\nu_p + d + 2}{2}}\exp\{-\frac{1}{2} trace (\Lambda_p^{-1}\Sigma^{-1})\} \exp \{-\frac{\kappa_p}{2}(\mu -\mu_p)^T\Sigma^{-1}(\mu-\mu_p)\} \\
&=|\Sigma|^{-\frac{\nu_p + d + 2}{2}}\exp\{-\frac{1}{2} trace (\Lambda_p^{-1}\Sigma^{-1})\} \exp \{-\frac{\kappa_p}{2} trace[\Sigma^{-1}(\mu -\mu_p)(\mu -\mu_p)^T]\}
\end{split}
\end{equation}
$$
as the independent prior over the mixture proportions is constant and therefore $\tau$ disappears in the approximation. This is a conjugate prior for a multivariate normal distribution, because the posterior can be also expressed as a product between a normal distribution and an inverse-Wishart.
## Model characterization
The covariance matrices were assumed to be ellipsoidal while their volumes, shapes and orientations were allowed to vary across all components. The hyperparameters $\xi$ are assumed to be equal across all components, and they are
- $\mu_p$ : the mean vector of the data
- $\kappa_p$ : .01
- $\nu_p$ : $d + 2 = 8$
- $\Lambda_p$ : $\frac{var(data)}{G^{2/d}}$
## Posterior M-step Estimators
Then, the posterior estimators for the mean and variance that maximize the expected complete-data log-likelihood (5) in the M-step of the EM algorithm become:
$$
\begin{equation}
\begin{split}
&\hat{\mu}_k = \frac{n_k\bar{y}_k + \kappa_p \mu_p}{n_k + \kappa_p}\\\\
&\hat{\Sigma}_k = \frac{\Lambda_p + \frac{\kappa_pn_k}{(n_k + \kappa_p)}(\bar{y}_k-\mu_p)(\bar{y}_k-\mu_p)^T + W_k}{\nu_p+n_k+d+2}
\end{split}
\end{equation}
$$
where $z_{i,k}$ is the conditional probability that observation $i$ belongs to the $k$-th component,\
$n_k \equiv \sum_{i=1}^n z_{i,k}$, $\bar{y}_k \equiv \sum_{i=1}^n\frac{z_{i,k}y_i}{n_k}$ and $W_k \equiv \sum_{i=1}^nz_{i,k}(y_i-\bar{y}_k)(y_i - \bar{y}_k)^T$.
## Model Evaluation (I)
50 possible models corresponding to $G = 1,...,50$ were evaluated based on their Bayesian Information Criterion (BIC) given by
$$
\begin{equation}
BIC_{\mathcal{M}} = 2 loglik_{\mathcal{M}}(y,\theta^*) - df_{\mathcal{M}} log(n)
\end{equation}
$$
where $loglik_{\mathcal{M}}(y,\theta^*)$ is the log-likelihood evaluated at the MAP for the model $\mathcal{M}$ and the data, $n$ is the number of observations in the data and $df_{\mathcal{M}}$ is the degrees of freedom for the model $\mathcal{M}$, corresponding to $df_{\mathcal{M}} = G_{\mathcal{M}} (\frac{(d\times d - d)}{2} + 2d + 1)$.
## Model Evaluation (II)
```{r BIC values for VVV, message=FALSE, warning=FALSE, include=FALSE}
set.seed(0)
BICmap <- mclustBIC(new.tw, G = 1:50, prior = priorControl(functionName = "defaultPrior"),modelNames = "VVV")
BICsp<- mclustBIC(new.tw, G = 1:50, modelNames = "VVV",control=emControl(eps=0, tol=c(0.9 ,0.9)))
BICnm <- mclustBIC(new.tw, G = 1:50, modelNames = "VVV")
```
In a normal model based setting, any component with fewer than d points will tend to have a singular covariance matrix, and hence produce an infinite likelihood, even if there is a true cluster with fewer than d points. Thus the singularities might lead to incorrect model specification, as the algorithm doesn't consider these solutions. The Bayesian regularization discussed above resolves this problem by allowing the likelihood to increase smoothly rather than jumping to infinity because, when a proper prior is defined, there are generally no paths along the parameter space in which the posterior density tends to infinity.
## Model Evaluation (III)
```{r fig 3, echo=FALSE, fig.align='center', fig.show='hold', message=FALSE, warning=FALSE, out.height="70%", out.width="70%"}
df<-data.frame(BIC.reg = as.vector(BICmap), BIC.sp = as.vector(BICsp),BIC.cl = as.vector(BICnm),G = c(1:50) )
ggplot(df, aes(x = G)) +
geom_line(aes(y = BIC.sp),linetype= "dashed", color = "red") +
geom_line(aes(y=BIC.reg),color = "black")+
geom_line(aes(y = BIC.cl), color = "red")+
geom_point(aes(y = BIC.sp),shape=1, color = "red") +
geom_point(aes(y=BIC.reg),color = "black")+
geom_point(aes(y = BIC.cl), color = "red") +
ylab("BIC")+
scale_x_continuous(breaks=seq(1,50,2))+
theme_test()
```
Following these results it was decided to choose $G = 9$, as the log-likelihood doesn't improve much after that value for G and the model was kept parsimonious.
## Results (First Three Components)
```{r message=FALSE, warning=FALSE, include=FALSE}
result <- Mclust(new.tw, G = 9, prior = priorControl(), modelNames = "VVV")
result.df<-data.frame(result$data, result$classification)
colnames(result.df)<-c("Racial","Activism","Partisan","Political","Civil","Party","Group")
```
```{r plot, echo=FALSE, message=FALSE, warning=FALSE}
fig4 <- plot_ly(result.df, x = ~Racial, y = ~Activism, z = ~Partisan, color = ~ as.character(result.df$Group), colors = "Set1")
fig4 <- fig4 %>% add_markers()
fig4
```
## Results (Second Three Components)
```{r fig5, echo=FALSE, message=FALSE, warning=FALSE}
fig5 <- plot_ly(result.df, x = ~Political, y = ~Civil, z = ~Party, color = ~ as.character(result.df$Group), colors = "Set1")
fig5 <- fig5 %>% add_markers()
fig5
```
## Discussion (I)
From these plots it is evident that most of the variability in the data is due to the large variance of the $8$th group, while all other components have much more concentrated densities. Note that there are many repetitions in the data, so despite what the visual inspection may suggest, group $8$ is not the largest, as confirmed by the evaluated $\tau_k$ :
```{r table 4, echo=FALSE, message=FALSE, warning=FALSE}
temp <- as.vector(result[["parameters"]][["pro"]])
temp<-format(round(temp, 3), nsmall = 3)
df2 <-data.frame(temp,c("tau.1","tau.2","tau.3","tau.4","tau.5","tau.6","tau.7","tau.8","tau.9"))
colnames(df2)<-c("var1","var2")
df2<-df2 %>% pivot_wider(names_from = var2, values_from = var1)
kable(df2, caption = "Table 4: mixture proportions")
```
## Discussion (II)
- The first component represents individual which have showed a liberal alignment regarding political based contrasts, but took no further action on other issues. This group can be interpreted as holding an anti-conservative ethos, but not necessarily pro leftist, as some of them holds typical conservative positions regarding party-backed social issues (the idea that all politicians are corrupt or inefficient)
- The second component contain individuals that have expressed liberal sentiments regarding issues of racial equality. Is the second most numerous community and it's very cohesive.
- Group 3 represents people who advocate for the use of vaccines. It's the most numerous, and their member are likely of centrist or democratic political extraction.
## Discussion (III)
- The fourth group is instead composed of individuals that like the idea of getting rid of the state ingerence, a retoric typically sustained by conservative parties. A few of them also support the republican-backed hashtag campaign "DemocratsAreADisaster", supporting the idea that this group identifies a populist or right-wing ethos.
- Component 5 has an opposite interpretation with respect to the previous one: this group identifies liberal democrats, people that showed support for democratic-backed social battles for voting rights, although didn't express other forms of liberal practices.
- The sixth identified community, representing true conservatives, is much smaller than the others: its members hold conservatives views regarding civil rights and are likely driven by a traditionalist ethos.
## Discussion (IV)
- The seventh group is composed of individuals that express their preference for the democratic party through their digital practices. However, many of them also show discontent toward politicians and question the effectiveness of vaccines; this leads to think that this group identifies the democratic party's popular base.
- As showed by the clustering, component 8 is the most heterogeneous. It includes many subgroups,as well as a couple major ones, namely people that holds a progressive stance regarding civil rights and moderate conservatives. As for the subgroups, it's possible to identify left-wing supporters, engaged democrats, right wing extremists and fundamentalist conservatives. This community is most probably a container for the minoritarian components of the social space.
## Discussion (V)
- Finally group 9, opposed to the third, contain individuals that are skeptical of the vaccines and identifies non-political conservatives.
## Conclusions
The method implemented constitutes a clustering procedure that can be applied to ordinal data, after NLPCA. It is robust to singularities in the covariance matrices of the components, thanks to the Bayesian regularization: this guarantees that components formed by identical observations -quite common in a context of ordinal data- that will have an estimate of their mean equal to the observations' value, won't be overlooked by the clustering algorithm thus making possible to specify the correct model.
## Further comments
two main issues:
- the sparsity of the original matrix
- the lack of a measure of meaningful relative distance between observations.
Solutions to both of these difficulties can be addressed in multiple ways: more dense initial matrix, more dimensions, different encodings (RNN), assume a multinomial distribution for the data and choose a suitable prior.