Project 2 of Deep Reinforcement Learning Nanodegree
The model used to generate this gif is
final.pth
(A2C-GAE withn = 8
steps), which was trained for 600 episodes usingmain.py
.
The environment for this project is Reacher from Unity and it is provided in the setup
folder. In particular, the version with 20 agents is considered. This repository contains an implementation of the A2C algorithm, a synchronous version of A3C and a variant that uses GAE.
For details on the implementation and comparison between the models see the report. Alternatively, you can find some pre-trained models under models/
and the source code in main.py
and code/
.
There are 20 independent agents, each with a double-jointed that can move to target locations. At every timestep, the goal location changes, and the goal is to keep the arm at the target location for as long as possible. A reward of +0.1 is provided for each timestep that this is true.
The state space consists of 33 variables corresponding to position, rotation, velocity, and angular velocities of the arm.
Actions are in the shape of a vector with 4 continuous values in the range of -1 to 1, corresponding to torque applicable to the two joints.
The task is episodic and an episode is evaluated as the average of all the agents' scores. The task is solved when the average score in the environment over 100 consecutive episodes reaches +30 or more.
Note that this was tested in macOS only
You'll need conda to prepare the environment and execute the code.
Other resources are already available in this repository under setup/
, so you can simply clone it.
git clone https://github.com/francescotorregrossa/deep-reinforcement-learning-nanodegree.git
cd deep-reinforcement-learning-nanodegree/p2-continuous-control
Optionally, you can install jupyter if you want to work on the report notebook.
This will create an environment named p2_continuous_control
and install the required libraries.
conda create --name p2_continuous_control python=3.6
conda activate p2_continuous_control
unzip setup.zip
pip install ./setup
You can use main.py
to watch an agent play the game. The provided model final.pth
is a A2C-GAE with n = 8
steps.
python main.py
If you want to try another configuration, you can use one of the files under model/
but note that you might also need to change this line in main.py
.
You can also use main.py
to train a new agent. Again, if you want to change the configuration you have to update this line. You'll find other classes and functions in the code/
folder. The report also contains useful functions for plotting results with matplotlib
.
python main.py -t
Note that this script will override final.pth
.
python -m ipykernel install --user --name p2_continuous_control --display-name "p2_continuous_control"
jupyter notebook
Make sure to set the kernel to p2_continuous_control
after you open the report.
conda deactivate
conda remove --name p2_continuous_control --all
jupyter kernelspec uninstall p2_continuous_control