🎉 v0.5.0
Compared to v0.4.0, UtilsRL v0.5.0 provides more easy-to-use and powerful tools:
New Features
UtilsRL.net
: Besides the normal MLP/CNN/RNN modules, UtilsRL also provides:EnsembleLinear
&EnsembleMLP
for network ensemble and efficient inferenceNoisyLinear
for efficient explorationAttention
for self-attention network design
UtilsRL.rl.buffer
: We refactored the buffers, and provided a highly efficient (5x) Prioritized Experience Replay (PER) implemented by c++ and pybind11.UtilsRL.rl.actor
&UtilsRL.rl.critic
: We now allow users to customize the output layers in actors and critics. This is extremely useful to fulfill various modifications to network structures, for example, ensemble/adding layer normalization/adding dropout and etc.UtilsRL.logger
: We now support file logger, tensorboard logger and WandB loggers, and users can useCompositeLogger
to freely combine different loggers.
New Examples
We add two examples to illustrate the pipeline of integrating UtilsRL (see examples/
):
- PPO algorithm, MuJoCo (continuous observation space, continuous control)
- Rainbow algorithm, Atari (image input, discrete control)
Old bugs(already fixed)
- Fix: return proper value for
random_batch
method ofReplayPool
by @mansicer in #17 - Fix argparse logic for float precision; fix interface error for ensemble linear by @typoverflow in #20
New friends
Full Changelog: v0.4.1...v0.5.0