Backend and reproducibility considerations for AI research #539
Closed
eleninisioti
started this conversation in
General
Replies: 1 comment 1 reply
-
Hi @eleninisioti , we recommend using the MJX backend, as it's the only one that's being actively developed at the moment. It's also the most feature complete and is closest to MuJoCo. With that being said, environments in this repo were heavily tuned for RE rng: I just ran your script and I get the same results over the 10 iterations. Are you seeing something else? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi! I have been recently using brax for research in reinforcement learning and evolutionary optimization and I was curious about how people choose the backend to use and what the developers recommend.
Between 'spring', 'positional' and 'generalized' it's best to go for generalized for better accuracy (I have seen robots such as halfcheetah breaking the physics with the spring backend). Between 'generalized' and 'mjx' I imagine it's best to go for 'mjx' (as 'generalized' may become deprecated?) My understanding is that they both aim at the same level of accuracy and computational complexity.
Yet I have observed that it is much harder to train agents with 'mjx' compared to 'generalized'. When I run the example script for training the ant robot with PPO I get to 8000 rewards with 'generalized' and to 4000 with 'mjx'. With evolutionary optimization the differences are even more pronounced, sometimes agents do not improve at all with 'mjx' while they reach about 2000 with 'generalized'.
What could be the reason for this difference and which of the two backends does it suggest to use? Maybe one has a bug or one is more accurate or algorithms are so brittle that they should be tuned differently for different backends.
Another concern is reproducibility. I do not understand why but vmapping over the environments gives different results for the same seed. (Code for seeing this:
Is this a bug or expected behavior?
On a more general note: looking at literature that uses environments based on the older and newer mujoco versions, you can see that people often do not report the backend, sometimes not even the reward function. Since such differences in configuration affect performance by a lot, we should always report our choices and ideally converge to a standard for some of them, such as the backend.
Beta Was this translation helpful? Give feedback.
All reactions