Ensemble and personalized transformer models for subject identification and relapse detection in e-Prevention Challenge
Salvatore Calcagno, Raffaele Mineo, Daniela Giordano, Concetto Spampinato
Official PyTorch implementation of paper: "Ensemble and personalized transformer models for subject identification and relapse detection in e-Prevention Challenge"
We present the devised solutions for subject identification and relapse detection of the e-Prevention Challenge hosted at the ICASSP 2023 conference [1] [2] [3]. We specifically design an ensemble scheme of six models - five transformer-based ones and a CNN model - for the identification of subjects from wearable devices, while a personalized - one for each subject - scheme is used for relapse detection in psychotic disorder. Our final submitted solutions yield top performance on both tracks of the challenge: we ranked second on the subject identification task (with an accuracy of 93.85%) and first on the relapse detection task (with a ROC-AUC and PR-AUC of about 0.65).
We show below the employed architectures for the ensemble model
Model Type | Architecture Details | Training Setting |
CNN (Transformer Ablation) | 5 convolutional layers (conv1D, ReLU, BatchNorm, Dropout), AdaptiveAvgPool1d, Time2Vec, Fully Connected Classification Head | batch size 64 Adam optimizer scheduler reduceLROnPlateau (initial learning rate 1e-4, factor 0.5, patience 10 epochs) |
Transformer | Embedding (5 convolutional layers, AdaptiveAvgPool1d) Positional Embedding (sin and cos encoding) Transformer Encoder (model depth 128, nlayers 2 , nhead 2, d_hid 512) Fully Connected Classification Head |
batch size 64 Adam optimizer scheduler reduceLROnPlateau (initial learning rate 5e-4, factor 0.5, patience 10 epochs) |
Transformer | Embedding (5 convolutional layers, AdaptiveAvgPool1d) Positional Embedding (Time2Vec) Transformer Encoder (model depth 32, nlayers 2 , nhead 2, d_hid 128 ) Fully Connected Classification Head |
batch size 64 Adam optimizer scheduler reduceLROnPlateau (initial learning rate 5e-4, factor 0.5, patience 10 epochs) |
Transformer | Embedding (5 convolutional layers, AdaptiveAvgPool1d) Positional Embedding (Time2Vec) Transformer Encoder (model depth 32, nlayers 2 , nhead 2, d_hid 128 ) Fully Connected Classification Head |
batch size 64 Adam optimizer scheduler reduceLROnPlateau (initial learning rate 5e-4, factor 0.5, patience 10 epochs) |
Transformer | Embedding (5 convolutional layers, AdaptiveAvgPool1d) Positional Embedding (Time2Vec) Transformer Encoder (model depth 32, nlayers 2 , nhead 2, d_hid 768) Fully Connected Classification Head |
batch size 64 Adam optimizer scheduler reduceLROnPlateau (initial learning rate 5e-4, factor 0.5, patience 10 epochs) |
Transformer | Embedding (5 convolutional layers, AdaptiveAvgPool1d) Positional Embedding (Time2Vec) Transformer Encoder (model depth 128, nlayers 2 , nhead 2, d_hid 768) Fully Connected Classification Head |
batch size 64 Adam optimizer scheduler reduceLROnPlateau (initial learning rate 5e-4, factor 0.5, patience 10 epochs) |
Best configurations were found using grid search for each subject:
For CNN-based models we tested the following parameters:
"parameters": {
"subject": {"values": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]},
"data_type": {"values": ["aggregated", "raw"]},
"learning_rate": {"values": [5e-3, 5e-4, 5e-5]},
"enable_variational": {"values": [0, 1]},
"model": {"values": ["cnn1d_autoencoder", "volund"]}
For Transformer-based models we tested the following parameters:
"parameters": {
"subject": {"values": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]},
"data_type": {"values": ["aggregated"]},
"learning_rate": {"values": [5e-3, 5e-4, 5e-5]},
"enable_variational": {"values": [0, 1]},
"model": {"values": ["transformer_autoencoder"]},
"d_model": {"values": [32, 64, 128]},
"nhead": {"values": [4, 8, 16]},
"nlayers": {"values": [2, 4]},
We show below the employed architectures for each subject.
Subject | Model Type | Architecture Details | Training Setting |
0 | Transformer | Embedding (linear projection) Positional Embedding (sin and cos encoding) Transformer Encoder (model depth 32, nlayers 2 , nhead 8, d_hid 2048) Transformer Decoder (model depth 32, nlayers 2 , nhead 8, d_hid 2048) Linear Mapping |
data type aggregated batch size 64 Adam optimizer scheduler reduceLROnPlateau (initial learning rate 5e-3, factor 0.5, patience 10 epochs) |
1 | Transformer | Embedding (linear projection) Positional Embedding (sin and cos encoding) Transformer Encoder (model depth 128, nlayers 2 , nhead 16, d_hid 2048) Transformer Decoder (model depth 128, nlayers 2 , nhead 16, d_hid 2048) Linear Mapping |
data type aggregated batch size 64 Adam optimizer scheduler reduceLROnPlateau (initial learning rate 5e-4, factor 0.5, patience 10 epochs) |
2 | CNN | CNN Encoder: 5 convolutional layers (conv1D, ReLU, BatchNorm, Dropout) Bottleneck: conv1D, ReLU CNN Decoder: 5 transposed convolutional layers (convTranspose1D, ReLU, BatchNorm, Dropout) |
data type aggregated batch size 64 Adam optimizer scheduler reduceLROnPlateau (initial learning rate 5e-3, factor 0.5, patience 10 epochs) |
3 | Transformer | Embedding (linear projection) Positional Embedding (sin and cos encoding) Transformer Encoder (model depth 32, nlayers 2 , nhead 4, d_hid 2048) Transformer Decoder (model depth 32, nlayers 2 , nhead 4, d_hid 2048) Linear Mapping |
data type aggregated batch size 64 Adam optimizer scheduler reduceLROnPlateau (initial learning rate 5e-4, factor 0.5, patience 10 epochs) |
4 | Transformer | Embedding (linear projection) Positional Embedding (sin and cos encoding) Transformer Encoder (model depth 32, nlayers 2 , nhead 8, d_hid 2048) Transformer Decoder (model depth 32, nlayers 2 , nhead 8, d_hid 2048) Linear Mapping |
data type aggregated batch size 64 Adam optimizer scheduler reduceLROnPlateau (initial learning rate 5e-3, factor 0.5, patience 10 epochs) |
5 | Volund | data type raw batch size 64 Adam optimizer scheduler reduceLROnPlateau (initial learning rate 5e-3, factor 0.5, patience 10 epochs) |
6 | CNN | CNN Encoder: 5 convolutional layers (conv1D, ReLU, BatchNorm, Dropout) Bottleneck: conv1D, ReLU CNN Decoder: 5 transposed convolutional layers (convTranspose1D, ReLU, BatchNorm, Dropout) Linear Mapping |
data type raw batch size 64 Adam optimizer scheduler reduceLROnPlateau (initial learning rate 5e-3, factor 0.5, patience 10 epochs) |
7 | Transformer | Embedding (linear projection) Positional Embedding (sin and cos encoding) Transformer Encoder (model depth 32, nlayers 2 , nhead 8, d_hid 2048) Transformer Decoder (model depth 32, nlayers 2 , nhead 8, d_hid 2048) Linear Mapping |
data type aggregated batch size 64 Adam optimizer scheduler reduceLROnPlateau (initial learning rate 5e-3, factor 0.5, patience 10 epochs) |
8 | Transformer | Embedding (linear projection) Positional Embedding (sin and cos encoding) Transformer Encoder (model depth 128, nlayers 2 , nhead 8, d_hid 2048) Transformer Decoder (model depth 128, nlayers 2 , nhead 8, d_hid 2048) Linear Mapping |
data type aggregated batch size 64 Adam optimizer scheduler reduceLROnPlateau (initial learning rate 5e-3, factor 0.5, patience 10 epochs) |
9 | Transformer | Embedding (linear projection) Positional Embedding (sin and cos encoding) Transformer Encoder (model depth 128, nlayers 2 , nhead 8, d_hid 2048) Transformer Decoder (model depth 128, nlayers 2 , nhead 8, d_hid 2048) Linear Mapping |
data type aggregated batch size 64 Adam optimizer scheduler reduceLROnPlateau (initial learning rate 5e-3, factor 0.5, patience 10 epochs) |
- NVIDIA GPU (Tested on Nvidia A6000 GPUs )
- Wandb account (change entity and project name in scripts)
- The datasets provided for track 1 and track 2 should be placed in ../datasets
- Requirements
To start training, simply run the following commands. Each command shows a model configuration, which will be used in the ensemble during validation and test. Please note that the first two commands are the same, since the same model was used with a weight of 2 in the voting scheme.
python train_track1.py --window_size 2160 --model transformer --d_model 32 --nhead 2 --d_hid 128 --nlayers 2 --learning_rate 5e-4 --enable_scheduler 1 --batch_size 64 --split_path data/track1/width3_stride3 --data_dir data/track1/width3_stride3
python train_track1.py --window_size 2160 --model transformer --d_model 32 --nhead 2 --d_hid 128 --nlayers 2 --learning_rate 5e-4 --enable_scheduler 1 --batch_size 64 --split_path data/track1/width3_stride3 --data_dir data/track1/width3_stride3
python train_track1.py --window_size 2160 --model transformer --d_model 32 --nhead 2 --d_hid 768 --nlayers 2 --learning_rate 5e-4 --enable_scheduler 1 --batch_size 64 --split_path data/track1/width3_stride3 --data_dir data/track1/width3_stride3
python train_track1.py --window_size 1080 --model transformer --d_model 128 --nhead 2 --d_hid 768 --nlayers 2 --learning_rate 5e-4 --enable_scheduler 1 --batch_size 64 --split_path data/track1/width1_5_stride1_5 --data_dir data/track1/width1_5_stride1_5
python train_track1.py --window_size 2160 --model transformer_ablation_time2vec --d_model 128 --nhead 2 --d_hid 512 --nlayers 2 --learning_rate 5e-4 --enable_scheduler 1 --batch_size 64 --split_path data/track1/width3_stride3 --data_dir data/track1/width3_stride3
python train_track1.py --window_size 2160 --model transformer_ablation --learning_rate 1e-4 --enable_scheduler 1 --batch_size 64 --split_path data/track1/width3_stride3 --data_dir data/track1/width3_stride3
The code expects a txt file ensemble.txt
with the list of names of models (the structure is shown below). The file should be placed in the root directory.
Model names YYYY-MM-DD_hh-mm-ss_<model*>
should be retrieved from the directory list in the experiments folder, after training.
Run the following to retrieve accuracies of single and ensemble models on the provided validation set.
python test_track1.py --split val
Use --split test
if you want to obtain predictions over test samples. Predictions will be saved into a file named test_track1.csv
We don't have the ground truth for this split.
The default essemble scheme is sum. You can use the --scheme
argument if you want to change the ensemble scheme. Allowed schemes are min, max and sum.
To start training, simply run the following commands.
python train_track2.py --subject 0 --model transformer_autoencoder --d_model 32 --nhead 8 --nlayers 2 --data_type aggregated --learning_rate 5e-3
python train_track2.py --subject 1 --model transformer_autoencoder --d_model 128 --nhead 16 --nlayers 2 --data_type aggregated --learning_rate 5e-4
python train_track2.py --subject 2 --model cnn1d_autoencoder --data_type aggregated --learning_rate 5e-3
python train_track2.py --subject 3 --model transformer_autoencoder --d_model 32 --nhead 4 --nlayers 2 --data_type aggregated --learning_rate 5e-4
python train_track2.py --subject 4 --model transformer_autoencoder --d_model 32 --nhead 8 --nlayers 2 --data_type aggregated --learning_rate 5e-3
python train_track2.py --subject 5 --model volund --data_type raw --learning_rate 5e-3
python train_track2.py --subject 6 --model cnn1d_autoencoder --data_type raw --learning_rate 5e-3
python train_track2.py --subject 7 --model transformer_autoencoder --d_model 128 --nhead 8 --nlayers 2 --data_type aggregated --learning_rate 5e-3
python train_track2.py --subject 8 --model transformer_autoencoder --d_model 128 --nhead 8 --nlayers 2 --data_type aggregated --learning_rate 5e-3
python train_track2.py --subject 9 --model transformer_autoencoder --d_model 128 --nhead 8 --nlayers 2 --data_type aggregated --learning_rate 5e-3
The code expects a txt file best_models.txt
with the list of names of models, the same as for the first track. The file should be placed in the root directory.
Run the following to retrieve performace (ROC-AUC, PRC-AUC and the harmonic mean of the previous two) of single models on the provided validation set.
python test_track2.py --split val
Use --split test
if you want to obtain predictions over test samples. Predictions will be saved into a file named test_track2.csv
We don't have the ground truth for this split.
[1] A Zlatintsi, P P Filntisis, C Garoufis, N Efthymiou, P Maragos, A Menychtas, I Maglogiannis, et al., “E- prevention: Advanced support system for monitoring and relapse prevention in patients with psychotic disorders analyzing long-term multimodal data from wearables and video captures,” Sensors, vol. 22, no. 19, 2022.
[2] G Retsinas, P P Filntisis, N Efthymiou, E Theodosis, A Zlatintsi, and P Maragos, “Person identification using deep convolutional neural networks on short-term signals from wearable sensors,” in ICASSP. IEEE, 2020.
[3] M Panagiotou, A Zlatintsi, PP Filntisis, AJ Roumeliotis, N Efthymiou, and P Maragos, “A comparative study of autoencoder architectures for mental health analysis us- ing wearable sensors data,” in EUSIPCO. IEEE, 2022.
[4] S M Kazemi, R Goel, S Eghbali, J Ramanan, J Sahota, S Thakur, S Wu, C Smyth, P Poupart, and M Brubaker, “Time2vec: Learning a vector representation of time,” arXiv preprint arXiv:1907.05321, 2019.