Skip to content

Commit

Permalink
Merge pull request #16 from fabiankueppers/develop
Browse files Browse the repository at this point in the history
Merge develop into master to add new release version
  • Loading branch information
fabiankueppers authored Jun 30, 2021
2 parents cc520e6 + e69ea69 commit c6c1fb7
Show file tree
Hide file tree
Showing 106 changed files with 11,838 additions and 3,739 deletions.
574 changes: 201 additions & 373 deletions LICENSE.txt

Large diffs are not rendered by default.

159 changes: 142 additions & 17 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,11 @@ For full API reference documentation, visit https://fabiankueppers.github.io/cal
Copyright (C) 2019-2021 Ruhr West University of Applied Sciences, Bottrop, Germany
AND Elektronische Fahrwerksysteme GmbH, Gaimersheim, Germany

This Source Code Form is subject to the terms of the Mozilla Public
License, v. 2.0. If a copy of the MPL was not distributed with this
file, You can obtain one at http://mozilla.org/MPL/2.0/.
This Source Code Form is subject to the terms of the Apache License 2.0.
If a copy of the APL2 was not distributed with this
file, You can obtain one at https://www.apache.org/licenses/LICENSE-2.0.txt.

If you use this framework or parts of it for your research, please cite it by::
**Important: updated references!** If you use this framework (*classification or detection*) or parts of it for your research, please cite it by::

@InProceedings{Kueppers_2020_CVPR_Workshops,
author = {Küppers, Fabian and Kronenberger, Jan and Shantia, Amirhossein and Haselhoff, Anselm},
Expand All @@ -20,6 +20,16 @@ If you use this framework or parts of it for your research, please cite it by::
year = {2020}
}

*If you use Bayesian calibration methods with uncertainty, please cite it by*::

@InProceedings{Kueppers_2021_IV,
author = {Küppers, Fabian and Kronenberger, Jan and Schneider, Jonas and Haselhoff, Anselm},
title = {Bayesian Confidence Calibration for Epistemic Uncertainty Modelling},
booktitle = {Proceedings of the IEEE Intelligent Vehicles Symposium (IV)},
month = {July},
year = {2021},
}

.. contents:: Table of Contents
:depth: 2

Expand All @@ -30,6 +40,21 @@ This framework is designed to calibrate the confidence estimates of classifiers

For example: given 100 predictions with a confidence of 80% of each prediction, the observed accuracy should also match 80% (neither more nor less). This behaviour is achievable with several calibration methods.

Update on version 1.2
---------------------
TL;DR:
- Bayesian confidence calibration: train and infer scaling methods using variational inference (VI) and MCMC sampling
- New metrics: MMCE [13]_ and PICP [14]_ (*netcal.metrics.MMCE* and *netcal.metrics.PICP*)
- New regularization methods: MMCE [13]_ and DCA [15]_ (*netcal.regularization.MMCEPenalty* and *netcal.regularization.DCAPenalty*)
- Updated examples
- Switched license from MPL2 to APL2

Now you can also use Bayesian methods to obtain uncertainty within a calibration mapping mainly in the *netcal.scaling* package. We adapted Markov-Chain Monte-Carlo sampling (MCMC) as well as Variational Inference (VI) on common calibration methods.
It is also easily possible to bring the scaling methods to CUDA in order to speed-up the computations. We further provide new metrics to evaluate confidence calibration (MMCE) and to evaluate the quality of prediction intervals (PICP).
Finally, we updated our framework by new regularization methods that can be used during model training (MMCE and DCA).

Update on version 1.1
---------------------
This framework can also be used to calibrate object detection models. It has recently been shown that calibration on object detection also depends on the position and/or scale of a predicted object [12]_. We provide calibration methods to perform confidence calibration w.r.t. the additional box regression branch.
For this purpose, we extended the commonly used Histogram Binning [3]_, Logistic Calibration alias Platt scaling [10]_ and the Beta Calibration method [2]_ to also include the bounding box information into a calibration mapping.
Furthermore, we provide two new methods called the *Dependent Logistic Calibration* and the *Dependent Beta Calibration* that are not only able to perform a calibration mapping
Expand Down Expand Up @@ -58,12 +83,16 @@ Or simply invoke the following command to install the calibration suite::

Requirements
------------
- numpy>=1.15
- numpy>=1.17
- scipy>=1.3
- matplotlib>=3.1
- scikit-learn>=0.20.0
- torch>=1.1
- tqdm
- scikit-learn>=0.21
- torch>=1.4
- torchvision>=0.5.0
- tqdm>=4.40
- pyro-ppl>=1.3
- tikzplotlib>=0.9.8
- tensorboard>=2.2


Calibration Metrics
Expand All @@ -74,10 +103,12 @@ For object detection, we implemented the *Detection Calibration Error* (D-ECE) [
- (Detection) Expected Calibration Error [1]_, [12]_ (*netcal.metrics.ECE*)
- (Detection) Maximum Calibration Error [1]_, [12]_ (*netcal.metrics.MCE*)
- (Detection) Average Calibration Error [11]_, [12]_ (*netcal.metrics.ACE*)
- Maximum Mean Calibration Error (MMCE) [13]_ (*netcal.metrics.MMCE*) (no position-dependency)
- Prediction interval coverage probability (PICP) (*netcal.metrics.PICP*) - this score is not a direct measure of confidence calibration but rather to measure the quality of uncertainty prediction intervals.

Methods
==========
The calibration methods are separated into binning and scaling methods. The binning methods divide the available information into several bins (like ECE or D-ECE) and perform calibration on each bin. The scaling methods scale the confidence estimates or logits directly to calibrated confidence estimates - on detection calibration, this is done w.r.t. the additional regression branch of a network.
=======
The post-hoc calibration methods are separated into binning and scaling methods. The binning methods divide the available information into several bins (like ECE or D-ECE) and perform calibration on each bin. The scaling methods scale the confidence estimates or logits directly to calibrated confidence estimates - on detection calibration, this is done w.r.t. the additional regression branch of a network.

Important: if you use the detection mode, you need to specifiy the flag "detection=True" in the constructor of the according method (this is not necessary for *netcal.scaling.LogisticCalibrationDependent* and *netcal.scaling.BetaCalibrationDependent*).

Expand Down Expand Up @@ -106,22 +137,30 @@ Implemented scaling methods are:
- Beta Calibration for classification [2]_ and object detection [12]_ (*netcal.scaling.BetaCalibration*)
- Dependent Beta Calibration for object detection [12]_ (*netcal.scaling.BetaCalibrationDependent*) - on detection, this method is able to capture correlations between all input quantities and should be preferred over Beta Calibration for object detection

**New on version 1.2:**: you can provide a parameter named "method" to the constructor of each scaling method. This parameter could be one of the following:
- 'mle': use the method feed-forward with maximum likelihood estimates on the calibration parameters (standard)
- 'momentum': use non-convex momentum optimization (e.g. default on dependent beta calibration)
- 'mcmc': use Markov-Chain Monte-Carlo sampling to obtain multiple parameter sets in order to quantify uncertainty in the calibration
- 'variational': use Variational Inference to obtain multiple parameter sets in order to quantify uncertainty in the calibration

Regularization
--------------
Implemented regularization methods are:
With some effort, it is also possible to push the model training towards calibrated confidences by regularization. Implemented regularization methods are:

- Confidence Penalty [8]_ (*netcal.regularization.confidence_penalty*)
- Confidence Penalty [8]_ (*netcal.regularization.confidence_penalty* and *netcal.regularization.ConfidencePenalty* - the latter one is a PyTorch implementation that might be used as a regularization term)
- Maximum Mean Calibration Error (MMCE) [13]_ (*netcal.regularization.MMCEPenalty* - PyTorch regularization module)
- DCA [15]_ (*netcal.regularization.DCAPenalty* - PyTorch regularization module)

Visualization
================
=============
For visualization of miscalibration, one can use a Confidence Histograms & Reliability Diagrams. These diagrams are similar to ECE, the output space is divided into equally spaced bins. The calibration gap between bin accuracy and bin confidence is visualized as a histogram.

On detection calibration, the miscalibration can be visualized either along one additional box information (e.g. the x-position of the predictions) or distributed over two additional box information in terms of a heatmap.

- Reliability Diagram [1]_, [12]_ (*netcal.presentation.ReliabilityDiagram*)

Examples
===========
========
The calibration methods work with the predicted confidence estimates of a neural network and on detection also with the bounding box regression branch.

Classification
Expand Down Expand Up @@ -170,7 +209,7 @@ The miscalibration can be visualized with a Reliability Diagram:
Detection
---------
This is a basic example which uses softmax predictions of a classification task with 10 classes and the given NumPy arrays:
In this example we use confidence predictions of an object detection model with the according x-position of the predicted bounding boxes. Our ground-truth provided to the calibration algorithm denotes if a bounding box has matched a ground-truth box with a certain IoU and the correct class label.

.. code-block:: python
Expand All @@ -187,11 +226,11 @@ This is an example for *netcal.scaling.LogisticCalibration* and *netcal.scaling.
input = np.stack((confidences, relative_x_position), axis=1)
lr = LogisticCalibration(detection=True) # flag 'detection=True' is mandatory for this method
lr = LogisticCalibration(detection=True, use_cuda=False) # flag 'detection=True' is mandatory for this method
lr.fit(input, matched)
calibrated = lr.transform(input)
lr_dependent = LogisticCalibrationDependent() # flag 'detection=True' is not necessary as this method is only defined for detection
lr_dependent = LogisticCalibrationDependent(use_cuda=False) # flag 'detection=True' is not necessary as this method is only defined for detection
lr_dependent.fit(input, matched)
calibrated = lr_dependent.transform(input)
Expand Down Expand Up @@ -220,6 +259,89 @@ The miscalibration can be visualized with a Reliability Diagram:
diagram.plot(input, matched) # visualize miscalibration of uncalibrated
diagram.plot(input_calibrated, matched) # visualize miscalibration of calibrated
Uncertainty in Calibration
--------------------------
We can also quantify the uncertainty in a calibration mapping if we use a Bayesian view on the calibration models. We can sample multiple parameter sets using MCMC sampling or VI. In this example, we reuse the data of the previous detection example.

.. code-block:: python
matched # binary NumPy 1-D array (0, 1) that indicates if a bounding box has matched a ground truth at a certain IoU with the right label - shape: (n_samples,)
confidences # NumPy 1-D array with confidence estimates between 0-1 - shape: (n_samples,)
relative_x_position # NumPy 1-D array with relative center-x position between 0-1 of each prediction - shape: (n_samples,)
This is an example for *netcal.scaling.LogisticCalibration* and *netcal.scaling.LogisticCalibrationDependent* but also works for every calibration method (remind different constructor parameters):

.. code-block:: python
import numpy as np
from netcal.scaling import LogisticCalibration, LogisticCalibrationDependent
input = np.stack((confidences, relative_x_position), axis=1)
# flag 'detection=True' is mandatory for this method
# use Variational Inference with 2000 optimization steps for creating this calibration mapping
lr = LogisticCalibration(detection=True, method'variational', vi_epochs=2000, use_cuda=False)
lr.fit(input, matched)
# 'num_samples=1000': sample 1000 parameter sets from VI
# thus, 'calibrated' has shape [1000, n_samples]
calibrated = lr.transform(input, num_samples=1000)
# flag 'detection=True' is not necessary as this method is only defined for detection
# this time, use Markov-Chain Monte-Carlo sampling with 250 warm-up steps, 250 parameter samples and one chain
lr_dependent = LogisticCalibrationDependent(method='mcmc',
mcmc_warmup_steps=250, mcmc_steps=250, mcmc_chains=1,
use_cuda=False)
lr_dependent.fit(input, matched)
# 'num_samples=1000': although we have only sampled 250 different parameter sets,
# we can randomly sample 1000 parameter sets from MCMC
calibrated = lr_dependent.transform(input)
You can directly pass the output to the D-ECE and PICP instance to measure miscalibration and mask quality:
.. code-block:: python
from netcal.metrics import ECE
from netcal.metrics import PICP
n_bins = 10
ece = ECE(n_bins, detection=True)
picp = PICP(n_bins, detection=True)
# the following function calls are equivalent:
miscalibration = ece.measure(calibrated, matched, uncertainty="mean")
miscalibration = ece.measure(np.mean(calibrated, axis=0), matched)
# now determine uncertainty quality
uncertainty = picp.measure(calibrated, matched, uncertainty="mean")
print("D-ECE:", miscalibration)
print("PICP:", uncertainty.picp) # prediction coverage probability
print("MPIW:", uncertainty.mpiw) # mean prediction interval width
If we want to measure miscalibration and uncertainty quality by means of the relative x position, we need to broadcast the according information:

.. code-block:: python
# broadcast and stack x information to calibrated information
broadcasted = np.broadcast_to(relative_x_position, calibrated.shape)
calibrated = np.stack((calibrated, broadcasted), axis=2)
n_bins = [10, 10]
ece = ECE(n_bins, detection=True)
picp = PICP(n_bins, detection=True)
# the following function calls are equivalent:
miscalibration = ece.measure(calibrated, matched, uncertainty="mean")
miscalibration = ece.measure(np.mean(calibrated, axis=0), matched)
# now determine uncertainty quality
uncertainty = picp.measure(calibrated, matched, uncertainty="mean")
print("D-ECE:", miscalibration)
print("PICP:", uncertainty.picp) # prediction coverage probability
print("MPIW:", uncertainty.mpiw) # mean prediction interval width
References
==========
.. [1] Naeini, Mahdi Pakdaman, Gregory Cooper, and Milos Hauskrecht: "Obtaining well calibrated probabilities using bayesian binning." Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015.
Expand All @@ -234,3 +356,6 @@ References
.. [10] Platt, John: "Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods." Advances in large margin classifiers, 10(3): 61–74, 1999.
.. [11] Neumann, Lukas, Andrew Zisserman, and Andrea Vedaldi: "Relaxed Softmax: Efficient Confidence Auto-Calibration for Safe Pedestrian Detection." Conference on Neural Information Processing Systems (NIPS) Workshop MLITS, 2018.
.. [12] Fabian Küppers, Jan Kronenberger, Amirhossein Shantia and Anselm Haselhoff: "Multivariate Confidence Calibration for Object Detection"." The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2020
.. [13] Kumar, Aviral, Sunita Sarawagi, and Ujjwal Jain: "Trainable calibration measures for neural networks from _kernel mean embeddings." International Conference on Machine Learning. 2018
.. [14] Jiayu Yao, Weiwei Pan, Soumya Ghosh, and Finale Doshi-Velez: "Quality of Uncertainty Quantification for Bayesian Neural Network Inference." Workshop on Uncertainty and Robustness in Deep Learning, ICML, 2019
.. [15] Liang, Gongbo, et al.: "Improved trainable calibration method for neural networks on medical imaging classification." arXiv preprint arXiv:2009.04057 (2020)
2 changes: 1 addition & 1 deletion docs/build/html/.buildinfo
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 78e6332c97a52e1ea66f816824406211
config: e4be416fe1fade80beef75b5aa42bfc9
tags: 645f666f9bcd5a90fca523b33c5a78b7
Loading

0 comments on commit c6c1fb7

Please sign in to comment.