30 Aug 14:52

typoverflow

🎉 v0.6.0 Latest

Latest

New Features

Transformers, for RL. We implement Transformer, GPT-2, and an experimental version of RWKV in UtilsRL.net.attention, to provide support for the incorporation of those highly expressive sequence modeling techniques into reinforcement learning.
DMControl env wrappers (#39)

What's Changed?

Argument parsing. Now you can use --config /path/to/config to designate the default config file from CLI.
Refactored Loggers. We refactored the logger module, unified the interfaces, and made them more handy for out-of-box usage.
Removal of redundant features. We removed some features, such as the Monitor module and the callback functions during arg-parsing.

Bug Fix

PER batch update (#31) @LyndonKong
(Urgent) Sick gradient computing for SquashedGaussianActor (#28) @typoverflow

Contributors

typoverflow and LyndonKong

Assets 2

15 Feb 07:01

typoverflow

🎉 v0.5.0

Compared to v0.4.0, UtilsRL v0.5.0 provides more easy-to-use and powerful tools:

New Features

UtilsRL.net: Besides the normal MLP/CNN/RNN modules, UtilsRL also provides:
- EnsembleLinear&EnsembleMLP for network ensemble and efficient inference
- NoisyLinear for efficient exploration
- Attention for self-attention network design
UtilsRL.rl.buffer: We refactored the buffers, and provided a highly efficient (5x) Prioritized Experience Replay (PER) implemented by c++ and pybind11.
UtilsRL.rl.actor&UtilsRL.rl.critic: We now allow users to customize the output layers in actors and critics. This is extremely useful to fulfill various modifications to network structures, for example, ensemble/adding layer normalization/adding dropout and etc.
UtilsRL.logger: We now support file logger, tensorboard logger and WandB loggers, and users can use CompositeLogger to freely combine different loggers.

New Examples

We add two examples to illustrate the pipeline of integrating UtilsRL (see examples/):

PPO algorithm, MuJoCo (continuous observation space, continuous control)
Rainbow algorithm, Atari (image input, discrete control)

Old bugs（already fixed）

Fix: return proper value for random_batch method of ReplayPool by @mansicer in #17
Fix argparse logic for float precision; fix interface error for ensemble linear by @typoverflow in #20

New friends

@mansicer made their first contribution in #17

Full Changelog: v0.4.1...v0.5.0

Contributors

mansicer and typoverflow

Assets 2

02 Jun 12:02

typoverflow

v0.4.0

本次更新

添加了RL网络模块，其中包括常见的策略网络输出头和replay buffer、normalizer等结构。
添加了tensorboard event file解析函数，作图模块待优化。
添加了快照功能（snapshot）。
添加了PPO在Gym-MuJoCo下的测试样例。
对README进行了更新。

Bug修复

NameSpace类成员变量添加位置错误 (#1)
数个代码细节上的调整。

后续计划

项目还在持续改进，欢迎功能上的建议和bug report！

功能上的改进计划见Issue区。
完善框架文档，完全重构README。

Full Changelog: v0.3.13...v0.4.0

Assets 4

14 Feb 10:25

typoverflow

v0.2.0 Pre-release

Pre-release

Features

Argument parsing utils
Training process monitor
Loggers
Device and seed management

Assets 4