Backend and reproducibility considerations for AI research #539

eleninisioti · 2024-10-22T11:27:36Z

eleninisioti
Oct 22, 2024

Hi! I have been recently using brax for research in reinforcement learning and evolutionary optimization and I was curious about how people choose the backend to use and what the developers recommend.

Between 'spring', 'positional' and 'generalized' it's best to go for generalized for better accuracy (I have seen robots such as halfcheetah breaking the physics with the spring backend). Between 'generalized' and 'mjx' I imagine it's best to go for 'mjx' (as 'generalized' may become deprecated?) My understanding is that they both aim at the same level of accuracy and computational complexity.

Yet I have observed that it is much harder to train agents with 'mjx' compared to 'generalized'. When I run the example script for training the ant robot with PPO I get to 8000 rewards with 'generalized' and to 4000 with 'mjx'. With evolutionary optimization the differences are even more pronounced, sometimes agents do not improve at all with 'mjx' while they reach about 2000 with 'generalized'.

What could be the reason for this difference and which of the two backends does it suggest to use? Maybe one has a bug or one is more accurate or algorithms are so brittle that they should be tuned differently for different backends.

Another concern is reproducibility. I do not understand why but vmapping over the environments gives different results for the same seed. (Code for seeing this:


from brax import envs

import jax


env = envs.get_environment(env_name="ant",  backend="generalized", reset_noise_scale=0.1)

num_eval_envs = 5

jit_env_reset = jax.jit(jax.vmap(env.reset))
jit_env_step = jax.jit(jax.vmap(env.step))
###
def policy(obs, key):
  return jax.random.uniform(key, shape=(env.action_size,))

jit_policy = jax.jit(jax.vmap(policy, in_axes=(0, None)), )

######
all_returns = []
######
for reps in range(10):
    rng = jax.random.PRNGKey(seed=0)
    rng, policy_key = jax.random.split(rng)
    reset_keys = jax.random.split(rng, num_eval_envs)
    return_value = jax.numpy.zeros((num_eval_envs,))
    env_states = jit_env_reset(reset_keys)
    for step in range(300):
        actions = jit_policy(env_states.obs, policy_key)
        env_states = jit_env_step(env_states, actions)
        ### Diagnostics
        return_value += env_states.reward
        rng, policy_key = jax.random.split(rng)
    print(f"single return {return_value}")
    all_returns.append(return_value)

Is this a bug or expected behavior?

On a more general note: looking at literature that uses environments based on the older and newer mujoco versions, you can see that people often do not report the backend, sometimes not even the reward function. Since such differences in configuration affect performance by a lot, we should always report our choices and ideally converge to a standard for some of them, such as the backend.

btaba · 2024-10-24T22:08:55Z

btaba
Oct 24, 2024
Maintainer

Hi @eleninisioti , we recommend using the MJX backend, as it's the only one that's being actively developed at the moment. It's also the most feature complete and is closest to MuJoCo. With that being said, environments in this repo were heavily tuned for generalized and positional, and not so much for MJX. Better behaviors in MJX might be one or two parameters edits away, I'm not sure. In this light, we hope to release a control suite tailored for MJX in the next few months.

RE rng: I just ran your script and I get the same results over the 10 iterations. Are you seeing something else?

1 reply

eleninisioti Nov 7, 2024
Author

Thanks! I am actually getting consistent results as well with the latest brax version, it was a previous version that was giving the weird behaviour (0.10.0 I think)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backend and reproducibility considerations for AI research #539

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Backend and reproducibility considerations for AI research #539

eleninisioti Oct 22, 2024

Replies: 1 comment · 1 reply

btaba Oct 24, 2024 Maintainer

eleninisioti Nov 7, 2024 Author

eleninisioti
Oct 22, 2024

Replies: 1 comment 1 reply

btaba
Oct 24, 2024
Maintainer

eleninisioti Nov 7, 2024
Author