Skip to content

Releases: hill-a/stable-baselines

Flexible Custom MLP Policies + bug fixes

05 Dec 19:25
Compare
Choose a tag to compare
  • added support for storing model in file like object. (thanks to @erniejunior)
  • fixed wrong image detection when using tensorboard logging with DQN
  • fixed bug in ppo2 when passing non callable lr after loading
  • fixed tensorboard logging in ppo2 when nminibatches=1
  • added early stoppping via callback return value (@erniejunior)
  • added more flexible custom mlp policies (@erniejunior)

Video Recorder

18 Nov 12:33
Compare
Choose a tag to compare
  • added VecVideoRecorder to record mp4 videos from environment.

Hotfix PPO2

07 Nov 19:59
9f36c9a
Compare
Choose a tag to compare
  • Hotfix for ppo2, the wrong placeholder was used for the value function

Note: this bug was present since v1.0, so we recommend to update to the latest version of stable-baselines

New VecEnv Features

06 Nov 20:45
Compare
Choose a tag to compare
  • added async_eigen_decomp parameter for ACKTR and set it to False by default (remove deprecation warnings)
  • added methods for calling env methods/setting attributes inside a VecEnv (thanks to @bjmuld)
  • updated gym minimum version

Contributors (since v2.0.0):

Thanks to @bjmuld @iambenzo @iandanforth @r7vme @brendenpetersen @huvar

Clean up dependencies + bug fix

20 Oct 08:28
Compare
Choose a tag to compare
  • fixed MpiAdam synchronization issue in PPO1 (thanks to @brendenpetersen) issue #50
  • fixed dependency issues (new mujoco-py requires a mujoco licence + gym broke MultiDiscrete space shape)

Bug fixes

02 Oct 12:35
Compare
Choose a tag to compare

WARNING: This version contains breaking changes, please read the full details

  • added patch fix for equal function using gym.spaces.MultiDiscrete and gym.spaces.MultiBinary
  • fixes for DQN action_probability
  • re-added double DQN + refactored DQN policies breaking changes
  • replaced async with async_eigen_decomp in ACKTR/KFAC for python 3.7 compatibility
  • removed action clipping for prediction of continuous actions (see issue #36)
  • fixed NaN issue due to clipping the continuous action in the wrong place (issue #36)

Tensorboard, refactoring and bug fixes

18 Sep 09:19
Compare
Choose a tag to compare

WARNING: This version contains breaking changes, please read the full details

  • Renamed DeepQ to DQN breaking changes
  • Renamed DeepQPolicy to DQNPolicy breaking changes
  • fixed DDPG behavior breaking changes
  • changed default policies for DDPG, so that DDPG now works correctly breaking changes
  • added more documentation (some modules from common).
  • added doc about using custom env
  • added Tensorboard support for A2C, ACER, ACKTR, DDPG, DeepQ, PPO1, PPO2 and TRPO
  • added episode reward to Tensorboard
  • added documentation for Tensorboard usage
  • added Identity for Box action space
  • fixed render function ignoring parameters when using wrapped environments
  • fixed PPO1 and TRPO done values for recurrent policies
  • fixed image normalization not occurring when using images
  • updated VecEnv objects for the new Gym version
  • added test for DDPG
  • refactored DQN policies
  • added registry for policies, can be passed as string to the agent
  • added documentation for custom policies + policy registration
  • fixed numpy warning when using DDPG Memory
  • fixed DummyVecEnv not copying the observation array when stepping and resetting
  • added pre-built docker images + installation instructions
  • added deterministic argument in the predict function
  • added assert in PPO2 for recurrent policies
  • fixed predict function to handle both vectorized and unwrapped environment
  • added input check to the predict function
  • refactored ActorCritic models to reduce code duplication
  • refactored Off Policy models (to begin HER and replay_buffer refactoring)
  • added tests for auto vectorization detection
  • fixed render function, to handle positional arguments

Bug fixes and documentation

29 Aug 11:52
Compare
Choose a tag to compare
  • added html documentation using sphinx + integration with read the docs
  • cleaned up README + typos
  • fixed normalization for DQN with images
  • fixed DQN identity test

Refactored Stable Baselines

20 Aug 15:01
Compare
Choose a tag to compare
  • refactored A2C, ACER, ACTKR, DDPG, DeepQ, GAIL, TRPO, PPO1 and PPO2 under a single constant class
  • added callback to refactored algorithm training
  • added saving and loading to refactored algorithms
  • refactored ACER, DDPG, GAIL, PPO1 and TRPO to fit with A2C, PPO2 and ACKTR policies
  • added new policies for most algorithms (Mlp, MlpLstm, MlpLnLstm, Cnn, CnnLstm and CnnLnLstm)
  • added dynamic environment switching (so continual RL learning is now feasible)
  • added prediction from observation and action probability from observation for all the algorithms
  • fixed graphs issues, so models wont collide in names
  • fixed behavior_clone weight loading for GAIL
  • fixed Tensorflow using all the GPU VRAM
  • fixed models so that they are all compatible with vectorized environments
  • fixed set_global_seed to update gym.spaces's random seed
  • fixed PPO1 and TRPO performance issues when learning identity function
  • added new tests for loading, saving, continuous actions and learning the identity function
  • fixed DQN wrapping for atari
  • added saving and loading for Vecnormalize wrapper
  • added automatic detection of action space (for the policy network)
  • fixed ACER buffer with constant values assuming n_stack=4
  • fixed some RL algorithms not clipping the action to be in the action_space, when using gym.spaces.Box
  • refactored algorithms can take either a gym.Environment or a str (if the environment name is registered)
  • Hoftix in ACER (compared to v1.0.0)

Future Work :

  • Finish refactoring HER
  • Refactor ACKTR and ACER for continuous implementation

v1.0.0

20 Aug 13:23
Compare
Choose a tag to compare
v1.0.0 Pre-release
Pre-release

Do not use: bug in ACER, fixed in v1.0.1