🎉 v0.5.0

typoverflow released this 15 Feb 07:01

· 43 commits to master since this release

c44cb3f

Compared to v0.4.0, UtilsRL v0.5.0 provides more easy-to-use and powerful tools:

New Features

UtilsRL.net: Besides the normal MLP/CNN/RNN modules, UtilsRL also provides:
- EnsembleLinear&EnsembleMLP for network ensemble and efficient inference
- NoisyLinear for efficient exploration
- Attention for self-attention network design
UtilsRL.rl.buffer: We refactored the buffers, and provided a highly efficient (5x) Prioritized Experience Replay (PER) implemented by c++ and pybind11.
UtilsRL.rl.actor&UtilsRL.rl.critic: We now allow users to customize the output layers in actors and critics. This is extremely useful to fulfill various modifications to network structures, for example, ensemble/adding layer normalization/adding dropout and etc.
UtilsRL.logger: We now support file logger, tensorboard logger and WandB loggers, and users can use CompositeLogger to freely combine different loggers.

New Examples

We add two examples to illustrate the pipeline of integrating UtilsRL (see examples/):

PPO algorithm, MuJoCo (continuous observation space, continuous control)
Rainbow algorithm, Atari (image input, discrete control)

Old bugs（already fixed）

Fix: return proper value for random_batch method of ReplayPool by @mansicer in #17
Fix argparse logic for float precision; fix interface error for ensemble linear by @typoverflow in #20

New friends

@mansicer made their first contribution in #17

Full Changelog: v0.4.1...v0.5.0

Contributors

mansicer and typoverflow

Assets 2