Skip to content

Latest commit

 

History

History
61 lines (44 loc) · 1.7 KB

README.md

File metadata and controls

61 lines (44 loc) · 1.7 KB

RL playground

Implementation of:

  • Q-learning to solve OpenAI's MountainCar-v0
  • PPO with TensorFlow 2.7 to solve OpenAI's CartPole-v1
  • SAC with TensorFlow 2.7 to solve OpenAI's MountainCarContinuous-v0

The simplicity makes understanding of PPO straightforward. All steps are represented in two files: agent.py implements the actor and critic networks. In Addition, it implements the forward pass (values/action) and the loss fuction. runner.py implements the rollout of the multiple parallel environments. In addition, it implements the advantage calculation and training of the agent on mini-batches.

In the future, I plan to extend the repo with other RL algos (e.g. A3C or continuous PPO).

Getting Started

Prerequisites (Example with pyenv python version handling for MacOS):

  • macOS
  • pyenv installed

Clone the repo and install requirements with the correct pypthon version

git clone https://github.com/mesjou/rl-playground.git && cd rl-playground

pyenv install 3.8.10
pyenv virtualenv 3.8.10 rl-project
pyenv local rl-project

python3 -m pip install --upgrade pip
python3 -m pip install -r requirements.txt

Run Game

Run the game and visualize in TensorBoard

python3 cartpole_ppo.py
tensorboard --logdir=runs/

Authors

References

I have been heavily relying on the cleanrl repo:

Additional resources for PPO:

Additional resources for SAC: