This repository accompanies the publication "Mental Wellbeing at Sea: a Prototype to Collect Speech Data in Maritime Settings" at HEALTHINF 2025:
Pascal Hecker, Monica Gonzalez-Machorro, Hesam Sagha, Saumya Dudeja, Matthias Kahlau, Florian Eyben, Bjorn W. Schuller, Bert Arnrich
In Proceedings of the 18th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2025) - Volume 2: HEALTHINF, pages 29-40
ISBN: 978-989-758-731-3; ISSN: 2184-4305
The figures presented in the publication are residing in the figures/ folder, which was mostly composed with the Jupyter notebook paper-plots_survey_responses.ipynb.
The central table with the significantly correlating features is paper-significantly_correlating_features.csv.
The central table for the statistical modelling approaches is called compiled-merged_denoised_noisy-paper.ods and it is being converted to the LaTeX table in the publication with the Jupyter notebook paper-compose_main_modelling_table.ipynb.
Unfortunately, we cannot share the data due to privacy constraints. The source in this repository was used to run the analyses presented in the publication and should provide some valuable means to check the routines applied.
To utilise the source code provided in this repository, preferably use a virtual environment manager of your choice and run pip install -r requirements-freeze-devaice.txt
.
devAIce is a commercial framework that provides the voice activity detection (VAD) and the signal-to-noise ratio (SNR) prediction in this study.
Without a respective devAIce license, you can run pip install -r requirements-freeze.txt
and have to implement alternative solutions for its functionalities.
Python version 3.8.10 was used in this project.
The main modelling pipeline launcher script resides in src/main.py.
Simply launch it by running python main.py
.
In src/experiment_configs/, you can find designated configuration files for particular experiment runs.
A config can be passed through the command line to the main script, such as:
python main.py experiment_configs/mental_wellbeing_at_sea/eGeMAPSv02-target_norm-no_denoising.yaml
.
The section experiment configs used lists all the configuration files that were used to obtain the results presented in the publication.
The results will be saved in a results/mwas/modelling
folder and contain a nested folder structure that encodes the respective experiment settings.
After several experiment runs, you can adjust and execute src/collect_results.py.
It will find all results-compiled.yaml
files and add its evaluation metrics to a .csv file saved in results/mwas/composed
.
In the script, you have to manually set the target variable, whose results you want to collect. You can further filter any string that is contained in the results paths and specifies the respective run (e.g., "type-no_feature_selection" for all models for which no feature selection was performed). This is implemented in the search_term = {"term": None, "name": "everything"}
dictionary, where term would be "type-no_feature_selection" and "name" can be chosen by you to be a recognizable identifier in the results/mwas/composed
folder hierarchy.
With that .csv file, you can then e.g., open it in LibriOffice, select everything (ctrl + a) → "Data" → "Sort" → "Sorty Key 1" = the column with the metric you find most meaningful, e,g, "Column B" for CCC, select "Descending".
That was, you get all your models sorted by their performance!
Then, select your best performing model, or any other model you want to inspect, and scroll to the "path" column. Use that path to navigate, (e.g., using cd
from results/mwas/modelling/
) to the model directory and check out the plots in the folder for regression plots of the train or test partition prediction.
Each model result folder contains a data
folder. That folder in turn contains df_results_train.parquet.zstd
and df_results_test.parquet.zstd
. These parquet files (for good compression) can be read with:
df = pd.read_parquet('df_results_test.parquet.zstd', engine='pyarrow')
and pyarrow
will be installed already through the requirements.
These results DataFrames contain the predicted and ground truth labels, as well as several other useful columns such as the speaker ID and the index of the outer CV fold.
If more in-depth debugging is required, the following option in the experiment configuration saves even further data:
ModelTrainer:
meta:
save_full_data: True
This will also save the filtered feature- and label DataFrame - to check e.g., how many feature columns were dropped through feature selection or how the feature values were normalized.
The Jupyter notebook bootstrapping-bulk_apply_confidence_intervals_to_results.ipynb was used to calculate the confidence intervals with the confidence_intervals package.
To compare the performance of some denoising methods, a model to estimate the SNR level of the individual audio files was employed in audio_quality-snr_filtering.ipynb. In the publication, we discard files with an SNR value < 7. The respective files to filter out were copied to audio_quality-filter_samples.ipynb, and that notebook is processed in the main modelling pipeline in src/main.py#L317.
For denoising, the "causal speech enhancement model" (publication, repository) was employed, decoupled from this repository. The resulting file hierarchy was passed back to the pipeline by pointing the configuration file field path_data
to the directory tree that was denoised; as an example, see the configuration file eGeMAPSv02-target_norm-facebook_denoiser-master64-converted_int16_dithering-filter_clipping_snr.yaml#8.
The Jupyter notebook check_clipping.ipynb was employed to check if the denoised files still contain clipping. No denoised files are clipped, but 17 "noisy" files showed clipping.
The experiment configuration files used to run the modelling for the publication are:
eGeMAPS features
- No denoising: eGeMAPSv02-target_norm-no_denoising.yaml
- Denoising and SNR-based filtering: eGeMAPSv02-target_norm-facebook_denoiser-master64-converted_int16_dithering-filter_clipping_snr.yaml
wav2vec2.0 embeddings as features
- No denoising: wav2vec2-large-robust-12-ft-emotion-msp-target_norm-dim-no_denoising.yaml
- Denoising and SNR-based filtering: wav2vec2-large-robust-12-ft-emotion-msp-dim-target_norm-facebook_denoiser-master64-converted_int16_dithering-filter_clipping_snr.yaml
- No denoising: wav2vec2-large-robust-ft-libri-960h-target_norm-no_denoising.yaml
- Denoising and SNR-based filtering: wav2vec2-large-robust-ft-libri-960h-target_norm-facebook_denoiser-master64-converted_int16_dithering-filter_clipping_snr.yaml
- No denoising: wav2vec2-large-xlsr-53-target_norm-no_denoising.yaml
- Denoising and SNR-based filtering: wav2vec2-large-xlsr-53-target_norm-facebook_denoiser-master64-converted_int16_dithering-filter_clipping_snr.yaml
├── README.md <- The top-level README for developers using this project.
│
├── data
│ └── processed <- The final, canonical data sets for modeling.
│
│
├── notebooks <- Jupyter notebooks essential to this repository.
│ │
│ ├── data <- Data assisting the notebooks.
│ │
│ └── figures <- Generated graphics and figures used in the publication.
│
├── requirements-freeze.txt <- The requirements file for reproducing the analysis environment, e.g.
│ generated with `pip freeze > requirements.txt`
│
└── src <- Source code for use in this project.
│
├── data <- Scripts to download or generate data.
│
├── features <- Scripts to turn raw data into features for modeling.
│
└── models <- Scripts to train models and then use trained models to make
predictions.
Project based on the cookiecutter data science project template. #cookiecutterdatascience