Algorithms we are re-implementing/plannning to re-implement:
ID | Agent | Classic | Atari | MuJoCo | Distributed | Pre-Training | SFs |
---|---|---|---|---|---|---|---|
1 | DQN | ✅ | ✅ | ☑️ | |||
2 | Double DQN | ✅ | |||||
3 | PER-DQN | ✅ | |||||
4 | Dueling DQN | ||||||
5 | A3C | ☑️ | |||||
6 | C51 | ||||||
7 | Noisy DQN | ||||||
8 | Rainbow | ✅ | ✅ | ||||
9 | R2D2 | ☑️ | |||||
10 | DERainbow | ✅ | |||||
11 | NGU | ☑️ | |||||
12 | Agnet57 | ☑️ |
conda create -n pixel
pip install -e .
pip install numpy tqdm wandb
pip install opencv-python ale-py gym[accept-rom-license]
pip install torch==1.12.0+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
conda create -n pixel
pip install -e .
pip install numpy tqdm wandb
pip install opencv-python ale-py gym[accept-rom-license]
pip install torch
You can find full default configurations in pixel/configs, but you can use a few external arguments.
conda activate pixel
python -m pixel.run --alg DERainbow --env ALE/Freeway-v5 --n-envs 0 --device 'cuda' --wb --seed 1 2 3
--alg
is the algorithm's name [DQN, DDQN, PER, Rainbow, DERainbow]--env
is for environment's id [e.g. Alien-v4, ALE/Alien-v5]--n-envs
is for number of envs (0 (default): single-non-vectorized setting, 1+: vectorized setting)--device
is for device used for networks training (default: 'cpu')--wb
is for activating W&B (default: False)--seed
is for random seed(s), one or more (default: 0)
Atari 100k/200k DERainbow | W&B
Game | 100k | 200k |
---|---|---|
Alien | 912 ±338 | 861.3 ±117 |
Hero | 27.9 ±3 | 30.7 ±0.3 |
Freeway | 6815 ±1005 | 8587.7 ±2164 |
Pong | -18.3 ±4 | -16.6 ±3 |
Qbert | 772 ±364 | 2588.3 ±1633 |
Atari 200k xx DERainbow | W&B
Game | 200k | 200k x1 | 200k x2 | 200k x4 | 200k x8 | 200k x16 |
---|---|---|---|---|---|---|
Alien | 861.3 ±117 | 766 ±130 | 636.8 ±105 | |||
Hero | 30.7 ±0.3 | 30.6 ±1 | ||||
Freeway | 8587.7 ±2164 | 7975.5 ±799 | ||||
Pong | -16.6 ±3 | -12.7 ±2 | ||||
Qbert | 2588.3 ±1633 | 3196.7 ±1142 |
Atari 50M (200M frames) Rainbow x64 | W&B
Game | 2M | 5M | 10M | 20M | 30M | 40M | 50M |
---|---|---|---|---|---|---|---|
Alien | |||||||
Asterisk | |||||||
Boxing | |||||||
Breakout | |||||||
Hero | |||||||
Freeway | |||||||
Pong | |||||||
Qbert |
Game | DQN | DDQN | PER | Rainbow | R2D2 | NGU | Agent57 |
---|---|---|---|---|---|---|---|
Alien | |||||||
Asterisk | |||||||
Boxing | |||||||
Breakout | |||||||
Hero | |||||||
Freeway | |||||||
Pong | |||||||
Qbert |
This repo is adapted from AMMI-RL, and many other great repos, mostly the following ones (not necessarily in order):
[1] Human-Level Control Through Deep RL. Mnih et al. @ Nature 2015
[2] Deep RL with Double Q-learning. van Hasselt et al. @ AAAI 2016
[3] Prioritized Experience Replay. Schaul et al. @ ICLR 2016
[4] Dueling Network Architectures for Deep RLg. Wang et al. @ ICLR 2016
[5] Asynchronous Methods for Deep RL. Mnih et al. @ ICML 2016
[6] A Distributional Perspective on RL. Bellemare et al. @ ICML 2017
[7] Noisy Networks for Exploration. Fortunato et al. @ ICLR 2018
[8] Rainbow: Combining Improvements in Deep RL. Hessel et al. @ AAAI 2018
[9] Recurrent Experience Replay in Distributed RL. Kapturowski et al. @ ICLR 2019
[10] When to use parametric models in reinforcement learning? van Hasselt et al. @ NeurIPS 2019
[11] Never Give Up: Learning Directed Exploration Strategies. Badia et al. @ ICLR 2020
[12] Agent57: Outperforming the human Atari benchmark. Badia et al. @ PMLR 2020