-
Notifications
You must be signed in to change notification settings - Fork 21
CNN AutoEncoder
The encoder consists of a stack of Conv1D and MaxPooling1D layers (max pooling being used for spatial down-sampling), while the decoder consists of a stack of Conv1D and UpSampling1D layers.
Input — (channels_num=61, window_size=500, 1)
* Conv1D(nb_filters=64, filter_length=5) + ReLU ---- (channels_num=61, window_size=500, 64)
* MaxPooling1D ------------------------------------- (channels_num=61, window_size=250, 64)
* Conv1D(nb_filters=32, filter_length=5) + ReLU ---- (channels_num=61, window_size=250, 32)
* MaxPooling1D ------------------------------------- (channels_num=61, window_size=125, 32)
* Dense + ReLU ------------------------------------- (channels_num=61, window_size=125, 1)
* Conv1D(nb_filters=32, filter_length=5) + ReLU ---- (channels_num=61, window_size=125, 32)
* UpSampling1D ------------------------------------- (channels_num=61, window_size=250, 32)
* Conv1D(nb_filters=64, filter_length=5) + ReLU ---- (channels_num=61, window_size=250, 64)
* UpSampling1D ------------------------------------- (channels_num=61, window_size=500, 64)
* Conv1D(nb_filters=64, filter_length=5) + Sigmoid - (channels_num=61, window_size=500, 1)
Output — (channels_num=61, window_size=500, 1)
ADAM was chosen as an optimizer with MSE loss and additional metrics — MAE.
Building a CNN model means that there are many hyper-parameters to choose from, among others we must decide about sizes of the filters.
Each filter will be placed on just one channel. In general we have 61 channels in the initial dataset.
- Window Size — 0.5 s (chosen from 0.05 s to 5 s) is the most suitable variant, as the target (emotions) happens quick, but not simultaneously.
- Number of filters — 64 and 32 (chosen from 16 to 128). Number of filters influence the number of model parameters and, as our dataset is comparatively small, huge number of parameters will cause model to overfit.
- Filter Length — 5 (chosen from 2 to 100) as the best working.
- Dropout coefficient — 0.4
We use batch_generator for model fitting. That means that on each iteration model takes random window sized batch from random file. Batches do not repeat during one epoch. (total of 31 files length from 300s to 1000 s) and 1 for validation)
- More experimenting with hyper-parameters
- More experimenting with loss and metrics function
- Look at the performance with RNN and experiment one more time.
EEG Emotion Recognition - Soboleva&Glazkova - 2018