diff --git a/docs/examples/graph-signal-cpd.ipynb b/docs/examples/graph-signal-cpd.ipynb new file mode 100644 index 00000000..c0ccb892 --- /dev/null +++ b/docs/examples/graph-signal-cpd.ipynb @@ -0,0 +1,184 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Graph signals change point detection with the Graph Fourier Scan Statistic: a low-pass band filter\n", + "\n", + "## Introduction\n", + "\n", + "Graph signal processing (GSP) is the study of multivariate signals $y_t \\in \\mathbb{R}^d$ lying on the nodes of a graph $\\mathcal{G} = (\\mathcal{V}, \\mathcal{E}, \\mathcal{W})$ (see for instance [[Stankovic2019](#Stankovic2019), [Shuman2013](#Shuman2013)]). As in standard signal processing, it is possible to define a Graph Fourier Transform and to generalize the notion of spectral filtering. The intuition behind the graph spectral frequencies is that a signal whose (graph) spectrum is located at low frequencies is \"smoother\" than a signal whose energy is concentrated on high frequencies. By \"smoother\", we refer to the notion of smoothness with respect to the structure of the graph [[Shuman2013](#Shuman2013)]: the smoother a signal, the closer its values on neighbor nodes. \n", + "\n", + "Thus, by applying a low-pass filter on the graph spectral domain, one is likely to remove from a graph signal the noise and/or uncorrelated information across the nodes. The authors of [[Ferrari2019](#Ferrari2019)] leverage this idea to define the Graph Fourier Scan Statistic (GFSS) algorithm (derived from the statistic introduced in [[Sharpnack2016](#Sharpnack2016)]). When a graph structure is available, this is one possible way of using notions coming from the field of GSP to enhance change point detection for multivariate signals.\n", + "\n", + "In what follows, we focus on the above approach and we show how to apply it with `ruptures`. This example relies on the class [CostGFSSL2](../user-guide/costs/costgfssl2.md), which results from the combination of the GFSS and the least squared deviation.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Illustration and objectives\n", + "\n", + "First, we briefly illustrate the behavior of the GFSS and we justify the usage of the cost function `CostGFSSL2`. For a formal definition, please see [CostGFSSL2](../user-guide/costs/costgfssl2.md). \n", + "\n", + "The application of the GFSS amounts to a low-pass graph spectral filtering parametrized by the so-called cut-sparsity $\\rho$. The corresponding filter is displayed below:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "\n", + "rho = 1\n", + "filter = lambda x, rho: np.minimum(1, np.sqrt(rho / x))\n", + "x = np.linspace(0, 10, 100)\n", + "filtered = [1] + list(filter(x[1:], rho))\n", + "\n", + "fig, ax = plt.subplots(1, 1, figsize=(6, 3))\n", + "ax.plot(x, filtered, label=\"GFSS filter\")\n", + "ax.axvline(x=rho, linestyle=\"--\", c=\"k\", label=\"$\\\\rho$\")\n", + "ax.set_xlabel(\"eigenvalues $\\lambda$\")\n", + "ax.legend()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As explained in the [introduction](#introduction), applying a low-pass filter to a graph signal shrinks the high-frequency components, thus attenuating the signal components that are not smooth with respect to the graph structure. Based on this statement, we deduce two potential (related) benefits of applying the cost `CostGFSSL2`:\n", + "\n", + "1. as in [[Ferrari2019](#Ferrari2019)], detecting mean changes that are localized on clusters of the graph only. Formally if we denote $m_t(i)$ the mean of the process at node $i$ and $\\mathcal{C}$ a well connected subset of $\\mathcal{V}$ (a cluster), one may try to detect $t_r$ such that:\n", + "\n", + "$$\n", + "y_t = m_t + e_t \\quad \\text{ with } ~ m_t = \n", + "\\begin{cases}\n", + " m & \\forall t < t_r \\\\ \n", + " m + \\delta & \\forall t \\geq t_r \n", + "\\end{cases}\n", + "\\quad \\text{ and } ~ \\delta_i = \n", + "\\begin{cases}\n", + " c > 0 & \\forall i \\in \\mathcal{C} \\\\ \n", + " 0 & \\text{ otherwise } \n", + "\\end{cases}\n", + "$$\n", + "\n", + "2. attenuating changes induced by spatially white noise (with high variance) or that may be due to individual dysfunctions of the observed system, for instance a geographical censors network, a social network..." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Setup\n", + "\n", + "We generate a synthetic graph matching the above description and we define a signal over it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import ruptures as rpt # our package\n", + "import networkx as nx # for graph utils" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Graph generation\n", + "\n", + "nb_nodes = 120\n", + "cluster_nb = 6\n", + "mean_cluster_size = 20\n", + "inter_density = 0.02 # density of inter-clusters edges\n", + "intra_density = 0.9 # density of intra-clusters edges\n", + "graph_seed = 9 # for reproducibility\n", + "G = nx.gaussian_random_partition_graph(\n", + " n=nb_nodes,\n", + " s=mean_cluster_size,\n", + " v=2 * mean_cluster_size,\n", + " p_in=intra_density,\n", + " p_out=inter_density,\n", + " seed=graph_seed,\n", + ")\n", + "coord = nx.spring_layout(G) # for plotting" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Vizualization of the graph clusters\n", + "\n", + "clusters_seed = 20\n", + "clusters = nx.algorithms.community.louvain.louvain_communities(G, seed=clusters_seed)\n", + "colors_dct = {0: \"r\", 1: \"b\", 2: \"g\", 3: \"orange\", 4: \"purple\", 5: \"brown\"}\n", + "cluster_idx_arr = np.zeros((nb_nodes))\n", + "\n", + "for cl_ind in range(len(clusters)):\n", + " for node_ind in list(clusters[cl_ind]):\n", + " cluster_idx_arr[node_ind] = cl_ind\n", + "\n", + "colors_l = [colors_dct[cluster_idx_arr[node_ind]] for node_ind in range(nb_nodes)]\n", + "\n", + "fig, ax = plt.subplots(1, 1, figsize=(8, 5))\n", + "ax.set_title(\"Clusters vizualization\")\n", + "nx.draw_networkx(G, pos=coord, with_labels=True, node_color=colors_l, ax=ax)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## References\n", + "\n", + "[Ferrari2019]\n", + "Ferrari, A., Richard, C., and Verduci, L. (2019). Distributed Change Detection in Streaming Graph Signals. IEEE 8th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), pages 166–170.\n", + "\n", + "[Sharpnack2016]\n", + "Sharpnack, J., Rinaldo, A., and Singh, A. (2016). Detecting Anomalous Activity on Networks With the Graph Fourier Scan Statistic. EEE Transactions on Signal Processing, 64(2):364–379.\n", + "\n", + "[Shuman2013]\n", + "Shuman, D. I., Narang, S. K., Frossard, P., Ortega, A., and Vandergheynst, P. (2013). The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. EEE Signal Processing Magazine, 30(3):83–98.\n", + "\n", + "[Stankovic2019]\n", + "Ljubisa Stankovic, Danilo P. Mandic, Milos Dakovic, Ilia Kisil, Ervin Sejdic, and Anthony G. Constantinides (2019). Understanding the Basis of Graph Signal Processing via an Intuitive Example-Driven Approach [Lecture Notes]. IEEE Signal Processing Magazine, 36(6):133–145.\n", + "\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": ".venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.12" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}