Skip to content

Policy Based Methods - Train a double-jointed arm to follow target locations. Part of Udacity DRL Nanodegree

Notifications You must be signed in to change notification settings

gscharly/drl_p2_continous_control

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Udacity Deep Reinforcement Learning course - Policy-based methods - P2 Continuous control

This repository contains code that train an agent to solve the environment proposed in the Policy Based Methods section of the Udacity Deep Reinforcement Learning (DRL) course.

Environment

SegmentLocal

The environment aget is a double-jointed arm that can move to target locations. A reward of +0.1 is provided for each step that the agent's hand is in the goal location. The goal of the agent is to maintain its position at the target location for as many time steps as possible. The environment is considered solved when the agent achieves an average score of 30 or more over 100 consecutive episodes.

Both the action and the state space are continuous. The observation space consists of 33 variables corresponding to position, rotation, velocity, and angular velocities of the arm. Each action is a vector with 4 numbers, corresponding to torque applicable to two joints. Every entry in the action vector must be a number between -1 and 1.

There are 2 possible environments. In the first one, there's a single agent, while the second one has 20 agents that play at the same time.

Getting started

Unity environments

Unity doesn't need to be installed since the environment is already available. The environments can be downloaded from the following links:

Version 1: One agent

Version 2: 20 agents

Python dependencies

The project uses Python 3.6 and relies on the Udacity Value Based Methods repository. This repository should be cloned, and the instructions on the README should be followed to install the necessary dependencies.

Instructions

The repository contains 2 scripts under the continuous_control package: train.py and play.py.

Train

The script train.py can be used to train the agent. The environment has been solved using the Deep Deterministic Policy Gradient (DDPG) algorithm. More details can be found in ipynb/report.ipynb

The script accepts the following arguments:

  • env-path: path pointing to the Unity Reacher environment

  • episodes: number of episodes the agent should be trained for

  • time-steps-per-episode: timesteps per episode

  • weights-path: path where the agent's NN weights will be stored

  • learning-rate-actor: Actor learning rate

  • learning-rate-critic: Critic learning rate

  • weight-decay-actor: Actor NN weight decay rate

  • weight-decay-critic: Critic NN weight decay rate

  • gamma: discount factor

  • batch-size: size of the agent's experience replay buffer

  • update-every: update actor & critic after t timesteps

  • noise-scalar: scalar that represents the noise to use when altering the actor weights

  • noise-scalar-decay: scalar that represents how much should the actor noise increase/decrease in each iteration

  • noise-distance: distance between the actor and the noised version of the actor to update the noise scalar

  • Example:

python train.py --env-path /home/carlos/cursos/udacity_rl_2023/repos/deep-reinforcement-learning/p2_continuous-control/Reacher_Linux_env2/Reacher.x86_64
--weights-path /home/carlos/cursos/udacity_rl_2023/projects/drl_p2_continous_control/weights
-- episodes 300

Play

A trained agent can be used to play! To do so, the play.py script can be used, providing the Unity environment and the agent's weights paths:

python play.py --env-path /home/carlos/cursos/udacity_rl_2023/repos/deep-reinforcement-learning/p2_continuous-control/Reacher_Linux_env2/Reacher.x86_64
--weights-path /home/carlos/cursos/udacity_rl_2023/projects/drl_p2_continous_control/weights

About

Policy Based Methods - Train a double-jointed arm to follow target locations. Part of Udacity DRL Nanodegree

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published