This repository contains several models for a classification of the reduced WISDM dataset.
Neural networks are used for feature extraction and classification.
These were implemented in Python using the PyTorch library. The latest neural networks have been implemented in TensorFlow. All files or folders with a "_tf" or "_TF" in the name are for TensorFlow.
This repository is based on a Kaggle Competition. The website for this Competition can be found here.
The task is a classification of biometric time series data. The dataset is the "WISDM Smartphone and Smartwatch Activity and Biometrics Dataset", WISDM stands for Wireless Sensor Data Mining. The actual dataset was created by the Department of Computer and Information Science at Fordham University in New York. The researchers collected data from the accelerometer and gyroscope sensors of a smartphone and smartwatch as 51 subjects performed 18 diverse activities of daily living. Each activity was performed for 3 minutes, so that each subject contributed 54 minutes of data.
A detailed description of the dataset is also included in this repo. However, if you would like to view the original data, you can find the complete dataset here.
As already mentioned, a reduced dataset is used, which contains the following six activities:
A - walking
B - jogging
C - climbing stairs
D - sitting
E - standing
M - kicking soccer ball
Moreover, not only eleven different neural networks are available, but training procedures and data pre-processing scripts are also included.
Models (neural networks):
- PyTorch
- Linear / Multilayer Perceptron (MLP) model
- Convolutional Neural Network (CNN) 1D model
- Gated Recurrent Units (GRU), this is a Recurrent Neural Network (RNN) model
- CNN 2D model
- Long Short-Term Memory (LSTM) model
- Linear / Multilayer Perceptron (MLP) model
- TensorFlow
- MLP model
- CNN 2D model
- GRU model
- LSTM model
- Big GRU model
- Convolutional LSTM model
- MLP model
Files | Description |
---|---|
Datasets/ | contains the data and the submissions |
Models/ | contains the trained models |
Plots/ | contains all plots from the training and testing |
.gitignore | contains files and folders that are not tracked via git |
dataset_tf.py | provides the dataset and prepares the data for TensorFlow |
datasets.py | provides the dataset and prepares the data for PyTorch |
helpers.py | provides auxiliary classes and functions for neural networks |
Job.sh | provides a script to carry out the training on a computer cluster |
models_tf.py | provides the models for TensorFlow |
models.py | provides the models for PyTorch |
train_tf.py | provides functions for training and testing for TensorFlow |
train.py | provides functions for training and testing for PyTorch |
WISDM-dataset-description.pdf | further description of the dataset |
The scores were calculated by Kaggle. The metric is the categorization accuracy (ACC).
Models | Public leaderboard score | Training time (hh:mm:ss) | Parameters of the model |
---|---|---|---|
MLP_NET_V1 | 0.45856 | 00:05:22 | 902 |
CNN_NET_V1 | 0.51933 | 00:21:17 | 141,766 |
GRU_NET | 0.00000 | PyTorch GRU does not work | 0 |
CNN_NET_V2 | 0.85635 | 00:01:28 | 134,134 |
LSTM_NET | 0.83425 | 00:16:16 | 529,926 |
MLP_NET_TF | 0.90055 | 00:08:20 | 112,262 |
CNN_NET_TF | 0.87845 | 00:06:18 | 1,641,030 |
GRU_NET_TF | 0.89502 | 00:18:55 | 4,175,238 |
LSTM_NET_TF | 0.88950 | 00:19:04 | 4,470,150 |
GRU_NET_BIG_TF | 0.95027 | 00:22:47 | 10,621,830 |
CONV_LSTM_NET_TF | 0.93370 | 00:35:53 | 14,721,926 |
The two models GRU_NET_BIG_TF and CONV_LSTM_NET_TF were trained with an extended data set. For this purpose, three new features were added by means of feature engineering. The features are the Fast Fourier Transformation (FFT) of the individual signals.
In addition, these two models were trained with data created with a sliding window of size 200. All other models were trained with size 100.
The best model is therefore the GRU_NET_BIG_TF with an accuracy of 95.027%.