Change Detection (CD) refers to identifying shifts in the distribution of a monitored data stream [1]. In this context, we focus on detecting abrupt and permanent changes in a univariate data stream.
This repository implements two widely used CD methods: the Cumulative Sum (CUSUM) [2] and the Change-Point Model (CPM) [3, 4]. To highlight that these methods are fully implemented in Python, we have named them pyCUSUM and pyCPM, respectively. Examples of their execution are illustrated in Fig. 1 and Fig. 2, respectively.
We chose to implement these methods because both are non-parametric, meaning they can monitor data without assuming any specific distribution. Specifically, CUSUM enables sequential monitoring for online analysis of data as it arrives. On the other hand, CPM enables batch-wise monitoring, where all data is available upfront for offline analysis, resulting in better performance.
We also compared our pyCPM with the CPM version available in the R library [4], showing that the difference between the two is negligible. Therefore, Python users can avoid installing the R version and use our pyCPM instead.
-
Clone our repository
-
Install required libraries by running
pip install -r requirements.txt
-
Install our package in editable mode by running:
pip3 install -e .
from the root folder.
All the steps to generate and monitor a data stream are outlined in the Jupyter notebook change_detection.ipynb
. Specifically:
- Data Stream Generation: A synthetic data stream with an abrupt and permanent change in mean is generated. This is achieved using the function in
src/data.py
. - Change Detection: Comparison between the CUSUM and CPM methods.
- CPM Comparison: Comparison between our CPM implementation and the version available in the R library.
The implementations of both CUSUM and CPM are located in the src/
folder:
src/CUSUM.py
– Our implementation of the CUSUM method.src/CPM.py
– Our implementation of the CPM method. Within CPM, we use the Mann-Whitney U, Mood, and Lepage tests, which are implemented insrc/StatisticalTest.py
.
Additionally, src/cpm_r_comparison.py
calls the R implementation of CPM. Please note that to run this comparison, R must be installed.
We also include visualizations to illustrate the performance of each method. Plotting functions can be found in src/Plotter.py
.
If you use our code in a scientific publication, we would appreciate citations using the following format:
@misc{rizzo2024:change_detection,
author = {Antonino Maria Rizzo},
title = {Change Detection: pyCUSUM and pyCPM},
year = {2024},
url = {https://github.com/antoninomariarizzo/change-detection},
}
[1] Basseville, M., and Nikiforov, I. Detection of abrupt change: Theory and application. Prentice-Hall, Inc. (1993).
[2] Tartakovsky, A., Nikiforov, I., and Basseville, M. Sequential analysis: Hypothesis testing and changepoint detection. Chapman & Hall. (2014).
[3] Ross, G., Tasoulis, D., and Adams, N. Nonparametric monitoring of data streams for changes in location and scale. Technometrics. (2012).
[4] Ross, G. J. Parametric and nonparametric sequential change detection in R: The cpm package. Journal of Statistical Software. (2015).