Continuous Control

Project 2 of Deep Reinforcement Learning Nanodegree

The model used to generate this gif is final.pth (A2C-GAE with n = 8 steps), which was trained for 600 episodes using main.py.

Overview

The environment for this project is Reacher from Unity and it is provided in the setup folder. In particular, the version with 20 agents is considered. This repository contains an implementation of the A2C algorithm, a synchronous version of A3C and a variant that uses GAE.

For details on the implementation and comparison between the models see the report. Alternatively, you can find some pre-trained models under models/ and the source code in main.py and code/.

Environment

There are 20 independent agents, each with a double-jointed that can move to target locations. At every timestep, the goal location changes, and the goal is to keep the arm at the target location for as long as possible. A reward of +0.1 is provided for each timestep that this is true.

The state space consists of 33 variables corresponding to position, rotation, velocity, and angular velocities of the arm.

Actions are in the shape of a vector with 4 continuous values in the range of -1 to 1, corresponding to torque applicable to the two joints.

The task is episodic and an episode is evaluated as the average of all the agents' scores. The task is solved when the average score in the environment over 100 consecutive episodes reaches +30 or more.

Getting started

Note that this was tested in macOS only

Requirements

You'll need conda to prepare the environment and execute the code.

Other resources are already available in this repository under setup/, so you can simply clone it.

git clone https://github.com/francescotorregrossa/deep-reinforcement-learning-nanodegree.git
cd deep-reinforcement-learning-nanodegree/p2-continuous-control

Optionally, you can install jupyter if you want to work on the report notebook.

Create a conda environment

This will create an environment named p2_continuous_control and install the required libraries.

conda create --name p2_continuous_control python=3.6
conda activate p2_continuous_control
unzip setup.zip
pip install ./setup

Watch a pre-trained agent

You can use main.py to watch an agent play the game. The provided model final.pth is a A2C-GAE with n = 8 steps.

python main.py

If you want to try another configuration, you can use one of the files under model/ but note that you might also need to change this line in main.py.

Train an agent from scratch

You can also use main.py to train a new agent. Again, if you want to change the configuration you have to update this line. You'll find other classes and functions in the code/ folder. The report also contains useful functions for plotting results with matplotlib.

python main.py -t

Note that this script will override final.pth.

Open the report with jupyter

python -m ipykernel install --user --name p2_continuous_control --display-name "p2_continuous_control"
jupyter notebook

Make sure to set the kernel to p2_continuous_control after you open the report.

Uninstall

conda deactivate
conda remove --name p2_continuous_control --all
jupyter kernelspec uninstall p2_continuous_control

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Continuous Control

Overview

Environment

Getting started

Requirements

Create a conda environment

Watch a pre-trained agent

Train an agent from scratch

Open the report with jupyter

Uninstall

Files

README.md

Latest commit

History

README.md

File metadata and controls

Continuous Control

Overview

Environment

Getting started

Requirements

Create a conda environment

Watch a pre-trained agent

Train an agent from scratch

Open the report with jupyter

Uninstall