Open-Source Python package for Single- and Multi-Players multi-armed Bandits algorithms.
This repository contains the code of Lilian Besson’s numerical environment, written in Python (2 or 3), for numerical simulations on single-player and multi-players Multi-Armed Bandits (MAB) algorithms.
A complete Sphinx-generated documentation is on SMPyBandits.GitHub.io.
SMPyBandits contains the most complete collection of single-player (classical) bandit algorithms on the Internet (over 65!), as well as implementation of all the state-of-the-art multi-player algorithms.
I follow very actively the latest publications related to Multi-Armed Bandits (MAB) research, and usually implement quite quickly the new algorithms (see for instance, Exp3++, CORRAL and SparseUCB were each introduced by articles (for Exp3++, for CORRAL, for SparseUCB) presented at COLT in July 2017, LearnExp comes from a NIPS 2017 paper, and kl-UCB++ from an ALT 2017 paper.).
- Classical MAB have a lot of applications, from clinical trials, A/B testing, game tree exploration, and online content recommendation (my framework does not implement contextual bandit - yet).
- Multi-player MAB have applications in Cognitive Radio, and my framework implements all the collision models found in the literature, as well as all the algorithms from the last 10 years or so (rhoRand from 2009, MEGA from 2015, MusicalChair, and our state-of-the-art algorithms RandTopM and MCTopM).
With this numerical framework, simulations can run on a single CPU or a multi-core machine, and summary plots are automatically saved as high-quality PNG, PDF and EPS (ready for being used in research article). Making new simulations is very easy, one only needs to write a configuration script and basically no code! See these examples (files named configuration_...py
).
A complete Sphinx documentation for each algorithms and every piece of code (included constants in the configurations!) is available here: SMPyBandits.GitHub.io.
Note
- I (Lilian Besson) have started my PhD in October 2016, and this is a part of my on going research since December 2016.
- I launched the documentation on March 2017, I wrote my first research articles using this framework in 2017 and decided to (finally) open-source my project in February 2018.
If you use this package for your own work, please consider citing it with this piece of BibTeX:
@misc{SMPyBandits,
title = {{SMPyBandits: an Open-Source Research Framework for Single and Multi-Players Multi-Arms Bandits (MAB) Algorithms in Python}},
author = {Lilian Besson},
year = {2018},
url = {https://github.com/SMPyBandits/SMPyBandits/},
howpublished = {Online at: \url{github.com/SMPyBandits/SMPyBandits}},
note = {Code at https://github.com/SMPyBandits/SMPyBandits/, documentation at https://smpybandits.github.io/}
}
I also wrote a small paper to present SMPyBandits, and I will send it to JMLR MLOSS. The paper can be consulted here on my website.
- More than 65 algorithms, including all known variants of the UCB, kl-UCB, MOSS and Thompson Sampling algorithms, as well as other less known algorithms (OCUCB, BESA, OSSB etc).
- SparseWrapper is a generalization of the SparseUCB from this article.
- Implementation of very recent Multi-Armed Bandits algorithms, e.g., kl-UCB++ (from this article), UCB-dagger (from this article), or MOSS-anytime (from this article).
- Experimental policies: BlackBoxOpt or UnsupervisedLearning (using Gaussian processes to learn the arms distributions).
- My framework mainly targets stochastic bandits, with arms following Bernoulli, bounded (SMPyBandits/truncated) or unbounded Gaussian, Exponential, Gamma or Poisson distributions.
- The default configuration is to use a fixed problem for N repetitions (e.g. 1000 repetitions, use MAB.MAB), but there is also a perfect support for "Bayesian" problems where the mean vector µ1,…,µK change at every repetition (see MAB.DynamicMAB).
- There is also a good support for Markovian problems, see MAB.MarkovianMAB, even though I didn’t implement any policies tailored for Markovian problems.
- Everything here is done in an imperative, object oriented style. The API of the Arms, Policy and MultiPlayersPolicy classes is documented in this page.
- The code is clean, valid for both Python 2 and Python 3.
- Some piece of code come from the pymaBandits project, but most of them were refactored. Thanks to the initial project!
- G.Varoquaux’s joblib is used for the Evaluator and EvaluatorMultiPlayers classes, so the simulations are easily parallelized on multi-core machines. (Put
n_jobs = -1
orPARALLEL = True
in the config file to use all your CPU cores, as it is by default).
See this document: How_to_run_the_code.md for more details.
TL;DR: this short bash snippet shows how to clone the code, install the requirements for Python 3 (in a virtualenv, and starts some simulation for N=100 repetitions of the default non-Bayesian Bernoulli-distributed problem, for K=9 arms, an horizon of T=10000 and on 4 CPUs (it should take about 20 minutes for each simulations):
cd /tmp/
# just be sure you have the latest virtualenv from Python 3
sudo pip3 install --upgrade --force-reinstall virtualenv
# create and active the virtualenv
virtualenv venv
. venv/bin/activate
type pip # check it is /tmp/venv/bin/pip
type python # check it is /tmp/venv/bin/python
pip install SMPyBandits # pulls latest version from https://pypi.org/project/SMPyBandits/
# or you can also
pip install git+https://github.com/SMPyBandits/SMPyBandits/#egg=SMPyBandits[full] # pulls latest version from https://github.com/SMPyBandits/SMPyBandits/
# run a single-player simulation!
N=100 T=10000 K=9 N_JOBS=4 make single
# run a multi-player simulation!
N=100 T=10000 M=3 K=9 N_JOBS=4 make more
- If speed matters to you and you want to use algorithms based on kl-UCB, you should take the time to build and install the fast C implementation of the utilities KL functions. Default is to use kullback.py, but using the C version from Policies/C/ really speeds up the computations. Just follow the instructions, it should work well (you need
gcc
to be installed). - And if speed matters, be sure that you have a working version of Numba, it is used by many small functions to (try to automatically) speed up the computations.
- This work is still experimental! It’s active research. It should be completely bug free and every single module/file should work perfectly(as this pylint log and this other one says), but bugs are sometimes hard to spot so if you encounter any issue, please fill a bug ticket.
- Whenever I add a new feature, I run experiments to check that nothing is broken. But there is no unittest (I don’t have time). You would have to trust me!
- This project is NOT meant to be a library that you can use elsewhere, but a research tool. In particular, I don’t take ensure that any of the Python modules can be imported from another directory than the main directory.
Contributions (issues, questions, pull requests) are of course welcome, but this project is and will stay a personal environment designed for quick research experiments, and will never try to be an industry-ready module for applications of Multi-Armed Bandits algorithms.
If you want to contribute, please have a look to the CONTRIBUTING page, and if you want to be more seriously involved, read the CODE_OF_CONDUCT page.
- You are welcome to submit an issue, if it was not previously answered,
- If you have interesting example of use of SMPyBandits, please share it! (Jupyter Notebooks are preferred). And fill a pull request to add it to the notebooks examples.
MIT Licensed (file LICENSE).
© 2016-2018 Lilian Besson.