Question About Ant Behavior #277

SumeetBatra · 2022-12-15T22:19:14Z

SumeetBatra
Dec 15, 2022

Hi folks, I have a question on how the ant environment behaves initially. I'm training RL policies with 2 layer mlp's using PPO and noticed that the initial rewards become quite negative before the policy begins to learn. I understand that this could be due to a myriad of differences in my PPO implementation, hyperparameters, model architecture etc. However, when I visualize just a randomly initialized policy, I see that sometimes ant flips over and accumulates large negative rewards up until timeout termination. Here's a screenshot that visualizes what's happening.

Is this correct behavior on the environment side? I would have thought there would be some termination condition if the ant flips over like that so that this doesn't continue until timeout termination. Or maybe I'm missing something.

btaba · 2022-12-19T22:02:16Z

btaba
Dec 19, 2022
Maintainer

Hi @SumeetBatra the environment should terminate when terminate_when_unhealthy is True (default), see https://github.com/google/brax/blob/main/brax/envs/ant.py#L241. However, if you're using something like the AutoResetWrapper, the environment will get reset automatically. Perhaps your random initialization is flipping the Ant, and the env keeps getting reset to the unhealthy state?

9 replies

SumeetBatra Dec 20, 2022
Author

Okay, after many attempts, I was able to reproduce it. Here's a short video with the torso's position and rotation visualized

ant_flipped-2022-12-20_11.47.32.mp4

It looks like the z-pos never goes outside the healthy z-range, so perhaps that's the bug?

SumeetBatra Dec 20, 2022
Author

and here's how I create my environment:

def make_vec_env_brax(cfg):
    entry_point = functools.partial(brax_custom.create_gym_env, env_name=cfg.env_name)
    brax_env_name = _to_custom_env[cfg.env_name]['custom_env_name']
    if brax_env_name not in gym.envs.registry.env_specs:
        gym.register(brax_env_name, entry_point=entry_point)

    act_bounds = _to_custom_env[cfg.env_name]['action_clip']
    obs_bounds = _to_custom_env[cfg.env_name]['obs_clip']
    rew_bounds = _to_custom_env[cfg.env_name]['reward_clip']
    vec_env = gym.make(_to_custom_env[cfg.env_name]['custom_env_name'], batch_size=cfg.env_batch_size, seed=cfg.seed,
                       clip_actions=act_bounds, clip_rewards=rew_bounds, clip_obs=obs_bounds)
    vec_env = to_torch.JaxToTorchWrapper(vec_env, device='cuda')

    return vec_env

Which then calls

def create(env_name: str,
           episode_length: int = 1000,
           action_repeat: int = 1,
           clip_actions: Optional[tuple] = None,
           clip_rewards: Optional[tuple] = None,
           clip_obs:     Optional[tuple] = None,
           auto_reset: bool = True,
           batch_size: Optional[int] = None,
           eval_metrics: bool = False,
           **kwargs) -> Env:
    """Creates an Env with a specified brax_custom system."""
    env = _envs[env_name](legacy_spring=True, **kwargs)
    env = FeetContactWrapper(env, env_name)
    if clip_obs:
        env = ObservationClipWrapper(env, obs_min=clip_obs[0], obs_max=clip_obs[1])
    if clip_rewards:
        env = RewardClipWrapper(env, rew_min=clip_rewards[0], rew_max=clip_rewards[1])
    if clip_actions:
        env = ActionClipWrapper(env, a_min=clip_actions[0], a_max=clip_actions[1])
    if episode_length is not None:
        env = wrappers.EpisodeWrapper(env, episode_length, action_repeat)
    if batch_size:
        env = wrappers.VectorWrapper(env, batch_size)
    if auto_reset:
        env = wrappers.AutoResetWrapper(env)
    if eval_metrics:
        env = wrappers.EvalWrapper(env)

    return env  # type: ignore

Actions are clipped (-1, 1) and obs/rew are clipped (-10, 10)

btaba Dec 22, 2022
Maintainer

Hi @SumeetBatra. Ah got it, thanks for debugging! Indeed it looks like z doesn't go below 0.2. This is a difference with the Mujoco Ant env and Brax v1, where Mujoco only has contact with the feet and the floor, not the torso and the floor. This is fixed in Brax v2 since we load directly from the Mujoco XML; I encourage you to check it out here and here.

SumeetBatra Dec 22, 2022
Author

Okay thanks for the update! Is there a rough timeline on when we can expect an official release of Brax v2? Also are other envs supported in the preview?

btaba Dec 22, 2022
Maintainer

Hi @SumeetBatra, for a very rough timeline: in the next few months or so! Feel free to pin v0.0.16 or work off of v2, we'll eventually move all the v2 stuff into the root src folder.
We'll be adding all of the same envs as in v1 bit by bit, but we only have Ant for now

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question About Ant Behavior #277

{{title}}

Replies: 1 comment 9 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Question About Ant Behavior #277

SumeetBatra Dec 15, 2022

Replies: 1 comment · 9 replies

btaba Dec 19, 2022 Maintainer

SumeetBatra Dec 20, 2022 Author

SumeetBatra Dec 20, 2022 Author

btaba Dec 22, 2022 Maintainer

SumeetBatra Dec 22, 2022 Author

btaba Dec 22, 2022 Maintainer

SumeetBatra
Dec 15, 2022

Replies: 1 comment 9 replies

btaba
Dec 19, 2022
Maintainer

SumeetBatra Dec 20, 2022
Author

SumeetBatra Dec 20, 2022
Author

btaba Dec 22, 2022
Maintainer

SumeetBatra Dec 22, 2022
Author

btaba Dec 22, 2022
Maintainer