Skip to content

RamiRibat/AgentPixel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Agent Pixel

RL for Discrete-Controlled & Pixel-based Environments

Algorithms

Algorithms we are re-implementing/plannning to re-implement:

ID Agent Classic Atari MuJoCo Distributed Pre-Training SFs
1 DQN ☑️
2 Double DQN
3 PER-DQN
4 Dueling DQN
5 A3C ☑️
6 C51
7 Noisy DQN
8 Rainbow
9 R2D2 ☑️
10 DERainbow
11 NGU ☑️
12 Agnet57 ☑️

How to use this code

Installation (Linux Ubuntu/Debian)

conda create -n pixel
pip install -e .
pip install numpy tqdm wandb
pip install opencv-python ale-py gym[accept-rom-license]
pip install torch==1.12.0+cu113 --extra-index-url https://download.pytorch.org/whl/cu113

Installation (MacOS)

conda create -n pixel
pip install -e .
pip install numpy tqdm wandb
pip install opencv-python ale-py gym[accept-rom-license]
pip install torch

Running Experiments

You can find full default configurations in pixel/configs, but you can use a few external arguments.

conda activate pixel
python -m pixel.run --alg DERainbow --env ALE/Freeway-v5 --n-envs 0 --device 'cuda' --wb --seed 1 2 3
  • --alg is the algorithm's name [DQN, DDQN, PER, Rainbow, DERainbow]
  • --env is for environment's id [e.g. Alien-v4, ALE/Alien-v5]
  • --n-envs is for number of envs (0 (default): single-non-vectorized setting, 1+: vectorized setting)
  • --device is for device used for networks training (default: 'cpu')
  • --wb is for activating W&B (default: False)
  • --seed is for random seed(s), one or more (default: 0)

Selected Results

Atari 100k/200k DERainbow | W&B

Game 100k 200k
Alien 912 ±338 861.3 ±117
Hero 27.9 ±3 30.7 ±0.3
Freeway 6815 ±1005 8587.7 ±2164
Pong -18.3 ±4 -16.6 ±3
Qbert 772 ±364 2588.3 ±1633

Atari 200k xx DERainbow | W&B

Game 200k 200k x1 200k x2 200k x4 200k x8 200k x16
Alien 861.3 ±117 766 ±130 636.8 ±105
Hero 30.7 ±0.3 30.6 ±1
Freeway 8587.7 ±2164 7975.5 ±799
Pong -16.6 ±3 -12.7 ±2
Qbert 2588.3 ±1633 3196.7 ±1142

Atari 50M (200M frames) Rainbow x64 | W&B

Game 2M 5M 10M 20M 30M 40M 50M
Alien
Asterisk
Boxing
Breakout
Hero
Freeway
Pong
Qbert

Atari 50M (200M frames) x64

Game DQN DDQN PER Rainbow R2D2 NGU Agent57
Alien
Asterisk
Boxing
Breakout
Hero
Freeway
Pong
Qbert

Acknowledgement

This repo is adapted from AMMI-RL, and many other great repos, mostly the following ones (not necessarily in order):

References

[1] Human-Level Control Through Deep RL. Mnih et al. @ Nature 2015
[2] Deep RL with Double Q-learning. van Hasselt et al. @ AAAI 2016
[3] Prioritized Experience Replay. Schaul et al. @ ICLR 2016
[4] Dueling Network Architectures for Deep RLg. Wang et al. @ ICLR 2016
[5] Asynchronous Methods for Deep RL. Mnih et al. @ ICML 2016
[6] A Distributional Perspective on RL. Bellemare et al. @ ICML 2017
[7] Noisy Networks for Exploration. Fortunato et al. @ ICLR 2018
[8] Rainbow: Combining Improvements in Deep RL. Hessel et al. @ AAAI 2018
[9] Recurrent Experience Replay in Distributed RL. Kapturowski et al. @ ICLR 2019
[10] When to use parametric models in reinforcement learning? van Hasselt et al. @ NeurIPS 2019
[11] Never Give Up: Learning Directed Exploration Strategies. Badia et al. @ ICLR 2020
[12] Agent57: Outperforming the human Atari benchmark. Badia et al. @ PMLR 2020