Skip to content

Simple Implementation of deep reinforcment learning algorithms in Tensorflow 2.x

License

Notifications You must be signed in to change notification settings

mesjou/rl-playground

Repository files navigation

RL playground

Implementation of:

  • Q-learning to solve OpenAI's MountainCar-v0
  • PPO with TensorFlow 2.7 to solve OpenAI's CartPole-v1
  • SAC with TensorFlow 2.7 to solve OpenAI's MountainCarContinuous-v0

The simplicity makes understanding of PPO straightforward. All steps are represented in two files: agent.py implements the actor and critic networks. In Addition, it implements the forward pass (values/action) and the loss fuction. runner.py implements the rollout of the multiple parallel environments. In addition, it implements the advantage calculation and training of the agent on mini-batches.

In the future, I plan to extend the repo with other RL algos (e.g. A3C or continuous PPO).

Getting Started

Prerequisites (Example with pyenv python version handling for MacOS):

  • macOS
  • pyenv installed

Clone the repo and install requirements with the correct pypthon version

git clone https://github.com/mesjou/rl-playground.git && cd rl-playground

pyenv install 3.8.10
pyenv virtualenv 3.8.10 rl-project
pyenv local rl-project

python3 -m pip install --upgrade pip
python3 -m pip install -r requirements.txt

Run Game

Run the game and visualize in TensorBoard

python3 cartpole_ppo.py
tensorboard --logdir=runs/

Authors

References

I have been heavily relying on the cleanrl repo:

Additional resources for PPO:

Additional resources for SAC:

About

Simple Implementation of deep reinforcment learning algorithms in Tensorflow 2.x

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages