This repo contains the code and research report for the final assignment of the Multi-Agent Systems course at Vrije Universiteit Amsterdam. The course starts by familiarising students with fundamental concepts of microeconomics, and ultimately serves as an introduction to game theory and reinforcement learning.
The final assignment consisted of two components: an implementation of Monte Carlo Tree Search (MCTS), and an implementation ofthe Actor-Critic algorithm, which is based on the Policy Gradient methodology. No start-up/reference code for the implementations was given.
The Monte Carlo Tree Search (MCTS) implementation here was designed to find a pre-defined leaf node in a binary tree of depth 20. To optimise the search process, we implement Upper-Confidence-Bound (UCB) action selection to balance exploration and exploitation.
The accompanying report focuses on analyzing the impact of the UCB exploration parameter on the search outcomes.
The MCTS implementation was based the explanation of the algorithm offered by Sutton & Barto, and an MCTS implementation made for the game of Tic-Tac-Toe by J. Zhang (see references below). To enable larger scale experimentation, parallelisation code was added.
For the reinforcement learning component, we explore the Actor-Critic Algorithm and perform experiments with it on a simple MDP. This part of the assignment guided us step-by-step through the fundamental concepts of Policy Gradient methods, and culminated in a from-scratch implementation of the Advantage Actor-Critic algorithm.
The report features an elaborate description of the implementation process, and results of using Actor-Critic and Advantage Actor-Critic for policy optimisation.
The code can be executed in a conda environment. I recommend installing miniconda - a guide to do so can be found here.
To install the requirements for running the code, you can clone this repo and create a conda environment (which will be named vu_mas) with Python 3.12 and the necessary dependencies by running the following commands in your CLI:
git clone https://github.com/mklblm/VU-Multi-agent-Systems
cd VU-Multi-agent-Systems
conda env create -f environment.yml
Alternatively, you can create a python 3.12 environment by other means, clone this repo and install the required packages by running:
git clone https://github.com/mklblm/VU-Multi-agent-Systems
cd VU-Multi-agent-Systems
pip install -r requirements.txt
MCTS:
The MCTS experiments can be done by simply running the mcts_main.py python script from the CLI by executing the command below.
python mcts_main.py
The mcts_main.py file also contains all the hyperparameters for the experimental setup. The experiment results will be saved to the mcts_results directory. I've included the Jupyter notebook mcts_results_analysis.ipynb which was used to visualise the results and perform the statistical tests used in the report.
Policy Gradient - Actor-Critic:
The experiments for the reiforcement learning component of the assignment are all contained in the reinforcement_learning.ipynb Jupyter notebook. Simply run all the cells the in the notebook to replicate the experiment.
- Mikel Blom github
- Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction (Second edition). The MIT Press. Direct download from Stanford University
- Zhang, J. (2024). GitHub - KnightZhang625/mcts_tic_tac. Retrieved December 19, 2024. github repository
This project is licensed under the MIT License - see the LICENSE.md file for details