Replies: 1 comment
-
When doing reinforcement learning, stochasticity is not a bug, it's a feature 😃 Stochasticity from the physics simulatorRunning the same agent again is indeed not deterministic, even when the policy is fully deterministic. This is because there are other sources of stochasticity that we don't control. One of them is the physics simulator, as it performs collision detection and forward dynamics with finite precision. Even when we reset from exactly the same state, as soon as there are some impacts involved, trajectories will have variance: ./start_simulation.sh #!/usr/bin/env python3
# -*- coding: utf-8 -*-
import gymnasium as gym
import matplotlib.pyplot as plt
import numpy as np
import upkie.envs
upkie.envs.register()
NB_RUNS = 10
NB_STEPS = 100
if __name__ == "__main__":
with gym.make("UpkieGroundVelocity-v3", frequency=200.0) as env:
trajectories = []
action = 0.0 * env.action_space.sample()
for i in range(NB_RUNS):
observation, _ = env.reset() # connects to the spine
trajectories.append([])
for step in range(NB_STEPS):
pitch = observation[0]
ground_pos = observation[1]
ground_vel = observation[3]
action[0] = 10.0 * pitch + 1.0 * ground_pos + 0.1 * ground_vel
observation, _, _, _, _ = env.step(action)
trajectories[-1].append(observation[0])
dt = env.unwrapped.dt
trange = np.arange(0.0, NB_STEPS * dt, dt)
plt.ion()
plt.grid(True)
plt.plot(trange, np.array(trajectories).T)
plt.ylim(-0.03, 0.03)
plt.legend(("pitch [rad]",)) Stochasticity from the agent and simulation loopsAnother source of stochasticity comes in when the agent and simulation loops are not synchronized. Here is the same example but this time setting ./start_simulation.sh --nb-substeps 5 Still, once the robot makes contact with the environment, there is not one solution but a distribution of them. |
Beta Was this translation helpful? Give feedback.
-
(Sorry I am not sure how to open a discussion, so I am opening an issue)
Is it possible to make 'ppo_balancer/run.py' more deterministic? I am having trouble comparing policies because there is too much randomness between runs (sagittal push that can be applied without falling varies greatly). I have set
deterministic=True
and passed theseed
to the functions below, but the runs are still not deterministic.Thank you!
Beta Was this translation helpful? Give feedback.
All reactions