Skip to content

Twin Delayed DDPG (TD3) and GAE bug fix (TRPO, PPO1, GAIL)

Compare
Choose a tag to compare
@araffin araffin released this 31 Jul 11:25
· 95 commits to master since this release
8ceda3b

New Features

  • added Twin Delayed DDPG (TD3) algorithm, with HER support
  • added support for continuous action spaces to action_probability, computing the
    PDF of a Gaussian policy in addition to the existing support for categorical stochastic policies.
  • added flag to action_probability to return log-probabilities.
  • added support for python lists and numpy arrays in logger.writekvs. (@dwiel)
  • the info dict returned by VecEnvs now include a terminal_observation key providing access to the last observation in a trajectory. (@qxcv)

Bug Fixes

  • fixed a bug in traj_segment_generator where the episode_starts was wrongly recorded, resulting in wrong calculation of Generalized Advantage Estimation (GAE), this affects TRPO, PPO1 and GAIL (thanks to @miguelrass for spotting the bug)
  • added missing property n_batch in BasePolicy.

Others

  • renamed some keys in traj_segment_generator to be more meaningful
  • retrieve unnormalized reward when using Monitor wrapper with TRPO, PPO1 and GAIL to display them in the logs (mean episode reward)
  • clean up DDPG code (renamed variables)

Documentation

  • doc fix for the hyperparameter tuning command in the rl zoo
  • added an example on how to log additional variable with tensorboard and a callback