Subtle Variation in Sepsis-III Definitions Influences Predictive Performance of Machine Learning

The early detection of sepsis is a key research priority to help facilitate timely intervention. Criteria used to identify the onset time of sepsis from health records vary, hindering comparison and progress in this field. We considered the effects of variations in sepsis onset definition on the predictive performance of three representative models (i.e. Light gradient boosting machine (LGBM), Long short term memory (LSTM) and Cox proportional-hazards models (CoxPHM)) for early sepsis detection.

This repository is the official implementation of the paper entitled "Subtle Variation of Sepsis-III Definitions Influences Predictive Performance of Machine Learning".

This repository contains code for the following parts in our experimental pipeline:

Extracting the sepsis labelling from the MIMIC-III data based on three sepsis criteria H1-3 and their variants (see src/database)
Training three types of models (i.e. LGBM, LSTM and CoxPHM) for the early sepsis prediction on the datasets produced in Step 1. (see src/models)
Evaluating each trained model using the test metrics (e.g. AUROC) and producing the visualization plots (see src/visualization)

Environment Setup

The code has been tested successfully using Python 3.7; thus we suggest using this version or a later version of Python. A typical process for installing the package dependencies involves creating a new Python virtual environment.

To install the required packages, run the following:

pip install -r requirements.txt

Finally, to prepare the environment for running the code, run the following:

source pythonpath.sh

Data Extraction Pipeline

To train and evaluate our models, we will change the relational format of the MIMIC-III database to a pivoted view which includes key demographic information, vital signs, and laboratory readings. We will also create tables for the possible sepsis onset times of each patient. We will subsequently output the pivoted data to comma-separated value (CSV) files, which serve as input for model training and evaluation.

Prior to running any of the data extraction commands, make sure to change to the src/database subdirectory:

cd src/database

Next, please follow the instructions in the data extraction README.md. (Depending on your preferred choice of installing PostgreSQL on your machine yourself or using a Docker container, please follow the relevant sections in the data extraction README.md.)

Model Training and Testing Pipeline

Feature Extraction

To generate the derived features mentioned in our paper, simply run the following:

python3 src/features/generate_features.py

The preceding command will save features required for model training/tuning/evaluation to data/processed.

Model tuning/training/evaluation

Initiate model tuning, training and evaluation using the main.py script. This script takes four optional arguments: --model, --process, --n_cpus, and --n_gpus:

python3 src/models/main.py --model MODEL_NAME --step STEP_NAME --n_cpus N_CPUS --n_gpus N_GPUS

where MODEL_NAME is either LGBM, LSTM, or CoxPHM and where STEP_NAME is either tune train, or eval. Furthermore, N_CPUS is the number of CPUs and N_GPUs is the number of GPUs.

For each of the three models (LGBM, LSTM, and CoxPHM), the required sequence of steps is tune, train, eval:

tune: For a given model, running the tuning step computes and saves optimal hyperparameters for subsequent training and evaluation.
train: The model is trained and saved to the model/ directory for subsequent evaluation.
eval: Evaluation involves generating numerical results and predictions, which are respectively saved to outputs/results and outputs/predictions.

Note: To run all three above steps in the required order for all three models on 1 CPU and on 1 GPU, simply run main.py without any arguments, i.e.

python3 src/models/main.py

The full pipeline could takes several days to complete, you can also download our pretrained model and obtain the results directly by the following commands:

bash pretrained_models.sh
python3 src/models/main.py --model MODEL_NAME --step eval

Visualizations

To reproduce all the plots in the paper, after having run the model evaluation step run the following command:

python3 src/visualization/main_plots.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Subtle Variation in Sepsis-III Definitions Influences Predictive Performance of Machine Learning

Environment Setup

Data Extraction Pipeline

Model Training and Testing Pipeline

Feature Extraction

Model tuning/training/evaluation

Visualizations

About

Releases

Packages

Contributors 6

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 740 Commits
data		data
models		models
outputs		outputs
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
poetry.lock		poetry.lock
pretrained_models.sh		pretrained_models.sh
pyproject.toml		pyproject.toml
pythonpath.sh		pythonpath.sh
requirements.txt		requirements.txt

License

datasig-ac-uk/Sepsis_Label_Extraction

Folders and files

Latest commit

History

Repository files navigation

Subtle Variation in Sepsis-III Definitions Influences Predictive Performance of Machine Learning

Environment Setup

Data Extraction Pipeline

Model Training and Testing Pipeline

Feature Extraction

Model tuning/training/evaluation

Visualizations

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages