Question About Ant Behavior #277
SumeetBatra
started this conversation in
General
Replies: 1 comment 9 replies
-
Hi @SumeetBatra the environment should terminate when |
Beta Was this translation helpful? Give feedback.
9 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi folks, I have a question on how the ant environment behaves initially. I'm training RL policies with 2 layer mlp's using PPO and noticed that the initial rewards become quite negative before the policy begins to learn. I understand that this could be due to a myriad of differences in my PPO implementation, hyperparameters, model architecture etc. However, when I visualize just a randomly initialized policy, I see that sometimes ant flips over and accumulates large negative rewards up until timeout termination. Here's a screenshot that visualizes what's happening.
Is this correct behavior on the environment side? I would have thought there would be some termination condition if the ant flips over like that so that this doesn't continue until timeout termination. Or maybe I'm missing something.
Beta Was this translation helpful? Give feedback.
All reactions