Skip to content

EvoRL is a fully GPU-accelerated framework for Evolutionary Reinforcement Learning, implemented with JAX. It supports Reinforcement Learning (RL), Evolutionary Computation (EC), Evolution-guided Reinforcement Learning (ERL), AutoRL, and seamless integration with GPU-optimized simulation environments.

License

Notifications You must be signed in to change notification settings

EMI-Group/evorl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🌟 EvoRL: A GPU-acclerated Framework for Evolutionary Reinforcement Learning 🌟

EvoRL Paper on arXiv

Table of Contents

Introduction

EvoRL is a fully GPU-acclerated framework for Evolutionary Reinforcement Learning, which is implemented by JAX and provides end-to-end GPU-acclerated training pipelines, including following processes:

  • Reinforcement Learning (RL)
  • Evolutionary Computation (EC)
  • Environment Simulation

EvoRL provides a highly efficient and user-friendly platform to develop and evaluate RL, EC and EvoRL algorithms.

Note

For comprehensive guidance, please visit our Documentation, where you'll find detailed installation steps, tutorials, practical examples, and complete API references.

Note

EvoRL is a sister project of EvoX.

Highlight

  • End-to-end training pipelines: The training pipelines for RL, EC and EvoRL are entirely executed on GPUs, eliminating dense communication between CPUs and GPUs in traditional implementations and fully utilizing the parallel computing capabilities of modern GPU architectures.
    • Most algorithms has a Workflow.step() function that is capable of jax.jit and jax.vmap(), supporting parallel training and JIT on full computation graph.
    • The maximum seed-up is up to 60x depend on the algorithms, see Performance.
  • Easy integration between EC and RL: Due to modular design, EC components can be easily plug-and-play in workflows and cooperate with RL.
  • Implementation of EvoRL algorithms: Currently, we provide two popular paradigms in Evolutionary Reinforcement Learning: Evolution-guided Reinforcement Learning (ERL): ERL, CEM-RL; and Population-based AutoRL: PBT.
  • Unified Environment API: Support multiple GPU-accelerated RL environment packages (eg: Brax, gymnax, ...). Multiple Env Wrappers are also provided.
  • Object-oriented functional programming model: Classes define the static execution logic and their running states are stored externally.

Overview of Key Concepts in EvoRL

  • Workflow defines the training logic of algorithms.
  • Agent defines the behavior of a learning agent, and its optional loss functions.
  • Env provides a unified interface for different environments.
  • SampleBatch is a data structure for continuous trajectories or shuffled transition batch.
  • EC module provide EC components like Evolutionary Algorithms (EAs) and related operators.

Installation

EvoRL is based on jax. So jax should be installed first, please follow JAX official installation guide. Since EvoRL is currently under development, we recommend installing the package from source.

# Install the evorl package from source
git clone https://github.com/EMI-Group/evorl.git
cd evorl
pip install -e .

For developers, see Contributing to EvoRL

Quickstart

Training

EvoRL uses hydra to manage configs and run algorithms. Users can use python -m evorl.train to run algorithms from CLI. Specify the agent and env field based on the related config file path (*.yaml) in configs folder.

# hierarchy of folder `configs/`
configs
β”œβ”€β”€ agent
β”‚   β”œβ”€β”€ ppo.yaml
β”‚   β”œβ”€β”€ ...
...
β”œβ”€β”€ config.yaml
β”œβ”€β”€ env
β”‚   β”œβ”€β”€ brax
β”‚   β”‚   β”œβ”€β”€ ant.yaml
β”‚   β”‚   β”œβ”€β”€ ...
β”‚   β”œβ”€β”€ envpool
β”‚   └── gymnax
└── logging.yaml

For example: To train the PPO agent with config file in configs/agent/ppo.yaml on environment Ant with config file in configs/env/brax/ant.yaml, type the following command:

python -m evorl.train agent=ppo env=brax/ant

Then the PPO algorithm starts training. If multiple GPUs are detected, most algorithms will automatically be trained in distributed mode.

For more advanced usage, see our documentation: Training.

Logging

When not using multi-run mode (without -m), the outputs will be stored in ./outputs. When using multi-run mode (-m), the outputs will be stored in ./multirun. Specifically, when launching algorithms from the training scripts, the log file and checkpoint files will be stored in ./outputs|multirun/train|train_dist/<timestamp>/<exp-name>/.

By default, the script will enable two recorders for logging: LogRecorder and WandbRecorder. LogRecorder will save logs (*.log) in the above path, and WandbRecorder will upload the data to WandB, which provides beautiful visualizations.

Screenshot in WandB dashboard:

Algorithms

Currently, EvoRL supports 4 types of algorithms

Type Algorithms
RL A2C, PPO, IMPALA, DQN, DDPG, TD3, SAC
EA OpenES, VanillaES, ARS, CMA-ES, algorithms from EvoX (PSO, NSGA-II, ...)
Evolution-guided RL ERL-GA, ERL-ES, ERL-EDA, CEMRL, CEMRL-OpenES
Population-based AutoRL PBT family (e.g: PBT-PPO, PBT-SAC, PBT-CSO-PPO)

RL Environments

By default, pip install evorl will automatically install environments on brax. If you want to use other supported environments, please install the additional environment packages. For example:

# ===== GPU-accelerated Environments =====
# gymnax Envs:
pip install gymnax
# Jumanji Envs:
pip install jumanji
# JaxMARL Envs:
pip install jaxmarl

# ===== CPU-based Environments =====
# EnvPool Envs: (also require py<3.12)
pip install envpool "numpy<2.0.0"
# Gymnasium Envs:
pip install "gymnasium[atari,mujoco,classic-control,box2d]>=1.1.0"

Warning

These additional environments have limited supports and some algorithms are incompatible with them.

Supported Environments

Environment Library Descriptions
Brax Robotic control
gymnax (experimental) classic control, bsuite, MinAtar
JaxMARL (experimental) Multi-agent Envs
Jumanji (experimental) Game, Combinatorial optimization
EnvPool (experimental) High-performance CPU-based environments
Gymnasium (experimental) Standard CPU-based environments

Performance

Test settings:

  • Hardware:
    • 2x Intel Xeon Gold 6132 (56 logical cores in total)
    • 128 GiB RAM
    • 1x Nvidia RTX 3090
  • Task: Swimmer

Issues and Discussions

To keep our project organized, please use the appropriate section for your topics:

  • Issues – For reporting bugs and PR only. When submitting an issue, please provide clear details to help with troubleshooting.
  • Discussions – For general questions, feature requests, and other topics.

Before posting, kindly check existing issues and discussions to avoid duplicates. Thank you for your contributions!

Acknowledgement

Citing EvoRL

If you use EvoRL in your research and want to cite it in your work, please use:

@article{zheng2025evorl,
  author    = {Bowen Zheng, Ran Cheng, Kay Chen Tan},
  journal   = {arXiv},
  pages     = {},
  publisher = {arXiv},
  title     = {{EvoRL}: A GPU-accelerated Framework for Evolutionary Reinforcement Learning},
  volume    = {abs/2501.15129},
  year      = {2025}
}

About

EvoRL is a fully GPU-accelerated framework for Evolutionary Reinforcement Learning, implemented with JAX. It supports Reinforcement Learning (RL), Evolutionary Computation (EC), Evolution-guided Reinforcement Learning (ERL), AutoRL, and seamless integration with GPU-optimized simulation environments.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages