Exploring the Edges of Latent State Clusters for Goal-Conditioned Reinforcement Learning (CE2)

Yuanlin Duan, Guofeng Cui and He Zhu

Code for "Exploring the Edges of Latent State Clusters for Goal-Conditioned Reinforcement Learning" (NeurIPS 2024), a method to do efficent exploration for GCRL.

If you find our paper or code useful, please reference us:

@article{duan2024exploring,
  title={Exploring the Edges of Latent State Clusters for Goal-Conditioned Reinforcement Learning},
  author={Duan, Yuanlin and Cui, Guofeng and Zhu, He},
  journal={arXiv preprint arXiv:2411.01396},
  year={2024}
}

CE2 learns the latent space by temporary distance estimator, reflecting the reachability of states in latent space. Then CE2 dose clustering to group states that are easily reachable from one another by the current policy under the latent space and traversing to states holding significant exploration potential on the boundary of these clusters before doing exploratory behavior. CE^2 selects goal states at the edges of these latent state clusters for exploration because (1) less explored regions are naturally adjacent to these boundaries. (2) given the easy accessibility between states within each cluster by the training policy, the agent’s capability extends to reaching states even at the cluster boundaries.

CE2 outperforms other exploration approaches across a variety of tasks.

Code Structure

CE2/
  |- Config/                    # config file for each environment.
  |- dreamerv2/                 # CE2 implement
  |- dreamerv2/gc_main.py       # Main running file

Step 1: CE2 installation

pip intall all dependencies:

pip install -r library.txt

And then, run:

pip install -e .

Step 2: Environment installation

We evaluate CE2 on six environments: Ant Maze, Point Maze, Walker, 3-block Stacking, Block Rotation, Pen Rotation.

MUJOCO install: MuJoCo 2.0

Ant Maze, Point Maze, 3-Block Stack environments:

The mrl codebase contains Ant Maze, Point Maze, and 3-block Stack environments.

git clone https://github.com/hueds/mrl.git

Before testing these three environments, you should make sure that the mrl path is set in the PYTHONPATH.

export PYTHONPATH=<path to your mrl folder>

Walker environment:

Clone the lexa-benchmark and dm_control repos.

git clone https://github.com/hueds/dm_control
git clone https://github.com/hueds/lexa-benchmark.git

Set up dm_control as a local python module:

cd dm_control
pip install .

Set LD_PRELOAD to the libGLEW path, and set the MUJOCO_GL and MUJOCO_RENDERER variables.

# if you want to run environments in the lexa-benchmark codebase
MUJOCO_GL=egl MUJOCO_RENDERER=egl LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libGLEW.so:/usr/lib/x86_64-linux-gnu/libGL.so  PYTHONPATH=<path to your lexa-benchmark folder like "/home/edward/lexa-benchmark">

Step 3: Run CE2

Training Scripts:

python dreamerv2_APS/gc_main.py --configs RotatePen(environment name in config file) --logdir "your logdir path"

Use the tensorboard to check the results.

tensorboard --logdir ~/logdir/your_logdir_name

Acknowledgements

CE2 builds on many prior works, and we thank the authors for their contributions.

PEG for the goal-cond MBRL Agent and goal picking for exploration implement
Dreamerv2 for the non-goal-cond. MBRL agent
LEXA for goal-cond. policy training logic, P2E exploration, and Walker task
mrl for their implement of baselines and environments

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Config		Config
Resources		Resources
dreamerv2		dreamerv2
README.md		README.md
library.txt		library.txt
script.txt		script.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploring the Edges of Latent State Clusters for Goal-Conditioned Reinforcement Learning (CE2)

Code Structure

Step 1: CE2 installation

Step 2: Environment installation

Step 3: Run CE2

Acknowledgements

About

Releases

Packages

Languages

RU-Automated-Reasoning-Group/CE2

Folders and files

Latest commit

History

Repository files navigation

Exploring the Edges of Latent State Clusters for Goal-Conditioned Reinforcement Learning (CE2)

Code Structure

Step 1: CE2 installation

Step 2: Environment installation

Step 3: Run CE2

Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages