Skip to content

CNN AutoEncoder

Soboleva Natalia edited this page Mar 3, 2018 · 3 revisions

Convolutional AutoEncoder

CNN AE Description

The encoder consists of a stack of Conv1D and MaxPooling1D layers (max pooling being used for spatial down-sampling), while the decoder consists of a stack of Conv1D and UpSampling1D layers.

Input — (channels_num=61, window_size=500, 1)

Encoder

* Conv1D(nb_filters=64, filter_length=5) + ReLU ---- (channels_num=61, window_size=500, 64)
* MaxPooling1D ------------------------------------- (channels_num=61, window_size=250, 64)
* Conv1D(nb_filters=32, filter_length=5) + ReLU ---- (channels_num=61, window_size=250, 32)
* MaxPooling1D ------------------------------------- (channels_num=61, window_size=125, 32)
* Dense + ReLU ------------------------------------- (channels_num=61, window_size=125, 1)

Decoder

* Conv1D(nb_filters=32, filter_length=5) + ReLU ---- (channels_num=61, window_size=125, 32)
* UpSampling1D ------------------------------------- (channels_num=61, window_size=250, 32)
* Conv1D(nb_filters=64, filter_length=5) + ReLU ---- (channels_num=61, window_size=250, 64)
* UpSampling1D ------------------------------------- (channels_num=61, window_size=500, 64)
* Conv1D(nb_filters=64, filter_length=5) + Sigmoid - (channels_num=61, window_size=500, 1)

Output — (channels_num=61, window_size=500, 1)

Learning process

ADAM was chosen as an optimizer with MSE loss and additional metrics — MAE.

Hyper-Parameters

Building a CNN model means that there are many hyper-parameters to choose from, among others we must decide about sizes of the filters.

Each filter will be placed on just one channel. In general we have 61 channels in the initial dataset.

Experiments

  • Window Size — 0.5 s (chosen from 0.05 s to 5 s) is the most suitable variant, as the target (emotions) happens quick, but not simultaneously.
  • Number of filters — 64 and 32 (chosen from 16 to 128). Number of filters influence the number of model parameters and, as our dataset is comparatively small, huge number of parameters will cause model to overfit.
  • Filter Length — 5 (chosen from 2 to 100) as the best working.
  • Dropout coefficient — 0.4

Batch Selection

We use batch_generator for model fitting. That means that on each iteration model takes random window sized batch from random file. Batches do not repeat during one epoch. (total of 31 files length from 300s to 1000 s) and 1 for validation)

Further Work

  1. More experimenting with hyper-parameters
  2. More experimenting with loss and metrics function
  3. Look at the performance with RNN and experiment one more time.