A Monte Carlo Tree Search (MCTS) based bot for the game of Connect4
I wanted to learn Reinforcement Learning so I thought this would be a good first exercise. Plus, the algorithm used in this project is present in MuZero and AlphaGo so I really wanted to learn about it!
Previously I did a minimal version with a simpler game: tictactoe or "michi"(for spanish speakers). If you are a newcomer to RL as I am, I may suggest first revising the K-armed Bandit problem to better understand what we mean by "policy".
Connect Four is a two-player strategy game in which players take turns dropping colored discs into a vertical grid, aiming to be the first to form a horizontal, vertical, or diagonal line of four discs of the same color.
Monte Carlo Tree Search is a decision-making algorithm that builds a search tree by simulating random playouts from a given state, progressively refining its choices based on the results. It combines exploration of new moves with exploitation of known successful moves to find optimal actions. In particular, we employ the Upper Bound Confidence to set weights between these two.
Depending on the amount of computation budget that you give the program it may behave better or worse as it can have the opportunity to explore a larger space of possible states.
git clone https://github.com/TAOGenna/Connect4-MonteCarlo.git
python3 interface.py
If you want the algorithm to search for deeper states, you can change that number in interface.py
. Search for
def ai_move(self, budget=3000):
if self.game_over:
return
and change the variable budget
to a bigger number.
There is room for improvement beyond the UBC policy. See papers:
- Monte-Carlo Tree Search and Minimax Hybrids, Hendrik Baier and Mark H.M. Winands
- Score Bounded Monte-Carlo Tree Search, Tristan Cazenave and Abdallah Saffidine
- Principal reference: A Survey of Monte Carlo Tree Search Methods, Cameron Browne et al.
- To Alfredo de la Fuente. I was looking through his projects and this one seemed suitable for someone trying to learn the basics of RL, so thanks for open-source it and also for the references