Experiments in explainable AI with exact optimization tools on the MNIST image dataset. Evaluating the robustness of a classifier based on (provably, globally) smallest adversarial pertubations.
Dependencies:
- python3
- CPLEX (and docplex python library)
- matplotlib
- numpy
- keras
Proof-of-concept level scripts on a simple neural network and a binary classification task.
train_twoclass.py
Train a simple fully connected neural network with one hidden layer.
min_explanation.py
Compute an "explanation" of a prediction. Given an input image this is a minimal set of pixels which determine the output label, regardless the value of any other pixels. Uses CPLEX as a decision procedure for a "destructive MUS" algorithm.
min_adv_sum.py
Compute the smallest adversarial example with respect to sum of squared errors on the original picture, using mixed integer quadratic programming (MIQP).
min_adv_card.py
Compute a smallest adversarial example with respect the total number of changed pixels from the original input using mixed integer programming (MIP).
Apply the above techniques to a multiclass classifier.
train_multiclass.py
Train a somewhat more complicated neural network with multiple hidden layers and output classes.
min_adv_sum.py
For a given imput image, compute the minimal changes to predict each possible label.
Can we do the same with simple convolutional networks?
train_cnn_simple.py
Train a simple CNN with 10 3x3 convolution kernels.
min_adv_sum.py
As above, but we observe clear visual differences in the minimum adversarial changes compared to the network without convolution.
min_adv_card.py
Computing minimum number of changed pixels instead.
binary.py
Implementing the encodings of AAAI paper Verifying Properties of Binarized Deep Neural Networks
- MIP: working
- IP: working
- CNF: producing intractably large formulas