Exploration of different dimensionality reduction techniques on the MNIST dataset. The objective is to reduce MNIST data samples to a 2-dimensional representation and analyze the obtained new representations using several appreaches:
- Variational AutoEncoders (VAE)
- Dense NN and CNN classifiers (with 2-neuron bottleneck)
- Linear approaches: PCA, LDA
- Non-linear approaches: UMAP, t-SNE
Two different VAEs are tested: one using flat pixel features (only Dense layers), and another using 2d images as input (using Conv2d layers).
- Dimensionality reduction (2D representation), z-dim:
- We observe that data samples are embedded to a latent representation where visually similar classes (e.g. 0 and 6) have a close representation (near in distance) in the latent 2d space.
- Learnt representations from VAEs (variation of mu and log_var on the axis)
- Surprisingly, using flat features can lead to good VAE latent representations.
A feed-forward neural network (shallowing flattened pixels) and a CNN (taking 2d images as input), both containing an inner layer with 2 neurons, are trained to classify the images into numbers. We take a look at the 2-neuron layer to observe which compressed representation each network has learnt.
- Learnt representations after training: Feed-forward NN (flat features) vs. CNN (2D features):
- Evolution of the learned MNIST manifold (2D latent space) along batches while learning classification task of the feed-forward NN. We can observe how the neural network tries to find the most suitable compressed 2D representation to easily discriminate the different categories, making the datasamples almost linearly separable (visually better than VAE latent representations):
- PCA
- LDA
- UMAP
- t-SNE
- Manifolds and latent representation learnt by NN and CNN classifiers is qualitatively and visually better, since the networks have the specific task of class separation (at least more interpretable than VAEs).
- Linear approaches struggle to find a good low-dimensional representation.
- UMAP provides interesting embeddings, and helps to easily identify outliers with acceptable computation time.