Project 3 of Deep Reinforcement Learning Nanodegree
The model used to generate this gif is
final.pth
(MADDPG with two agents that share the networks in a self-play fashion), which was trained for 2000 episodes usingmain.py
.
The environment for this project is Tennis from Unity and it is provided in the setup
folder. This repository contains an implementation of the MADDPG algorithm (although not directly from pixels), but in this case two agents share the same actor and critic networks, to allow for a self-play mechanism.
For details on the implementation see the report. Alternatively, you can find the pre-trained model under models/
and the source code in main.py
and code/
.
Two agents control rackets to bounce a ball over a net. The goal is to keep playing for as long as possible. If an agent hits the ball over the net, it receives a reward of +0.1. If an agent lets a ball hit the ground or hits the ball out of bounds, it receives a reward of -0.01.
The state space has 24 dimensions per agent and consists of three consecutive vectors, each of 8 variables. The 8-valued vector contains information about the ball and the agent's racket at a specific frame.
At each timestep, each agent can output an action with 2 continuous dimensions that represent horizontal and vertical movement of the racket. Values should be between -1 and 1.
The task is episodic, and in order to solve the environment, the agents need to score an average of +0.5 over 100 consecutive episodes, after taking the maximum over both agents.
Note that this was tested in macOS only
You'll need conda to prepare the environment and execute the code.
Other resources are already available in this repository under setup/
, so you can simply clone it.
git clone https://github.com/francescotorregrossa/deep-reinforcement-learning-nanodegree.git
cd deep-reinforcement-learning-nanodegree/p3-collab-compet
Optionally, you can install jupyter if you want to work on the report notebook.
This will create an environment named p3_collab_compet
and install the required libraries.
conda create --name p3_collab_compet python=3.6
conda activate p3_collab_compet
unzip setup.zip
pip install ./setup
You can use main.py
to watch two agents play the game. The provided model final.pth
is a MADDPG with self-play, so it contains one actor only that is used for both agents.
python main.py
You can also use main.py
to train a new agent. If you want to change the configuration you'll find the replay buffer's capacity in main.py, the network architectures in Actor.py and Critic.py, and finally the rest of the hyperparameters in MADDPG.py. The report also contains useful functions for plotting results with matplotlib
.
python main.py -t
Note that this script will override final.pth
and could take a few hours to run!
python -m ipykernel install --user --name p3_collab_compet --display-name "p3_collab_compet"
jupyter notebook
Make sure to set the kernel to p3_collab_compet
after you open the report.
conda deactivate
conda remove --name p3_collab_compet --all
jupyter kernelspec uninstall p3_collab_compet