Re-implementations of Deep Reinforcement Learning (DRL) algorithms, written in PyTorch.
pip install -r requirements.txt
- Deep Q Networks (DQN) [paper] [official code]
- Deep Double Q Networks (DDQN) [paper]
- Dueling Network Architectures for Deep Reinforcement Learning (DuelDQN) [paper]
- Continuous control with deep reinforcement learning (DDPG) [paper]
- Addressing Function Approximation Error in Actor-Critic Methods (TD3) [paper] [official code]
- Soft Actor-Critic Algorithms and Applications (SAC) [paper] [official code]
- Trust Region Policy Optimization (TRPO) [paper] [official code]
- Proximal Policy Optimization (PPO) [paper] [official code]
# train an RL agent
# by default, training results are stored at the `runs` dir
python train_agent.py agent=ppo env.id=Hopper-v5
# plot the training results
python plot.py
# collect expert demonstrations
python collect_demo.py env.id=Hopper-v5 expert_model_path=models/hopper_sac_expert.pt
With the progress of this project, I found many open-source materials on the Internet to be excellent references. I am deeply grateful for the efforts of their authors. Below is a detailed list. Additionally, I would like to extend my thanks to my friends from LAMDA-RL for our helpful discussions.
Codebase
- tianshou
- stable-baselines3
- stable-baselines-contrib
- stable-baselines
- spinningup
- RL-Adventure2
- unstable_baselines
- d4rl_evaluations
- TD3
- pytorch-trpo
Blog
Tutorial