Skip to content

lhk/rl_gym

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reinforcement learning

This repository collects implementations for common reinforcement learning algorithms. So far I've implemented the following algorithms:

  • DQN with many varieties: DDQN, Dueling-Q-Learning, prioritized experience replay
  • A3C (threading based)
  • PPO (sequential/threading and gpu/cpu)

Currently I'm working on refactoring, there is a lot of duplicated code. Initially, I wanted every algorithm to be self-contained: For example, the ddqn implementation should be one block of code, without external dependencies. But most pieces of the code are very similar. And it would be nice to have a more modular setup, to try dqn with a different targets (TD-lambda etc). So far, I have merged all the different dqn versions into one package called dqn. But I'm undecided if it will stay like this.

A small update on the refactoring: I decided to keep the following classes

  • Agent: Responsible for interacting with the environment, keeping track of rewards (TD-lambda) and feeding the memory
  • Memory: Stores the training samples. Can be a simple buffer (A3C) or more complex (priority based sampling for DQN)
  • Model: The function approximator for policy, values, Q-values, whatever you are using
  • Brain: Wrapper around the model, responsible for training and updates

The refactoring is mostly done now. A3C, PPO and DQN all follow the same design principles. (Also note: Before the refactoring, I've trained the various algorithms against doom and atari. I'll do that again to check if something has been broken. TODO: remove this text after the tests have been successful :) )

Next to the refactoring, I've started to look at homebrew environments for reinforcement learning. Such as a car which should navigate to an obstacle. The motivation is to learn about complexity of tasks. For example: Many atari games give pretty immediate rewards (paddle missed the ball : -1). In this car environment the first reward would come after quite a few timesteps. How hard is that actually ? For the homebrewing, I've added a very small interface which wraps:

  • vizdoom
  • openai atari
  • homebrew car simulation

The following is a policy learned by PPO on the car simulation. The green dot is the car, it must navigate towards the white dot, while avoiding the red dots. This environment returns a list of positions in polar coordinates. The repository contains lots of helper methods, for example to render the polar coordinate representation to a numpy array and export it as gif: homebrew

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages