Releases: typoverflow/UtilsRL
Releases · typoverflow/UtilsRL
🎉 v0.6.0
New Features
- Transformers, for RL. We implement Transformer, GPT-2, and an experimental version of RWKV in
UtilsRL.net.attention
, to provide support for the incorporation of those highly expressive sequence modeling techniques into reinforcement learning. - DMControl env wrappers (#39)
What's Changed?
- Argument parsing. Now you can use
--config /path/to/config
to designate the default config file from CLI. - Refactored Loggers. We refactored the logger module, unified the interfaces, and made them more handy for out-of-box usage.
- Removal of redundant features. We removed some features, such as the Monitor module and the callback functions during arg-parsing.
Bug Fix
- PER batch update (#31) @LyndonKong
- (Urgent) Sick gradient computing for SquashedGaussianActor (#28) @typoverflow
🎉 v0.5.0
Compared to v0.4.0, UtilsRL v0.5.0 provides more easy-to-use and powerful tools:
New Features
UtilsRL.net
: Besides the normal MLP/CNN/RNN modules, UtilsRL also provides:EnsembleLinear
&EnsembleMLP
for network ensemble and efficient inferenceNoisyLinear
for efficient explorationAttention
for self-attention network design
UtilsRL.rl.buffer
: We refactored the buffers, and provided a highly efficient (5x) Prioritized Experience Replay (PER) implemented by c++ and pybind11.UtilsRL.rl.actor
&UtilsRL.rl.critic
: We now allow users to customize the output layers in actors and critics. This is extremely useful to fulfill various modifications to network structures, for example, ensemble/adding layer normalization/adding dropout and etc.UtilsRL.logger
: We now support file logger, tensorboard logger and WandB loggers, and users can useCompositeLogger
to freely combine different loggers.
New Examples
We add two examples to illustrate the pipeline of integrating UtilsRL (see examples/
):
- PPO algorithm, MuJoCo (continuous observation space, continuous control)
- Rainbow algorithm, Atari (image input, discrete control)
Old bugs(already fixed)
- Fix: return proper value for
random_batch
method ofReplayPool
by @mansicer in #17 - Fix argparse logic for float precision; fix interface error for ensemble linear by @typoverflow in #20
New friends
Full Changelog: v0.4.1...v0.5.0
v0.4.0
本次更新
- 添加了RL网络模块,其中包括常见的策略网络输出头和replay buffer、normalizer等结构。
- 添加了tensorboard event file解析函数,作图模块待优化。
- 添加了快照功能(snapshot)。
- 添加了PPO在Gym-MuJoCo下的测试样例。
- 对README进行了更新。
Bug修复
- NameSpace类成员变量添加位置错误 (#1)
- 数个代码细节上的调整。
后续计划
项目还在持续改进,欢迎功能上的建议和bug report!
- 功能上的改进计划见Issue区。
- 完善框架文档,完全重构README。
Full Changelog: v0.3.13...v0.4.0
v0.2.0
Features
- Argument parsing utils
- Training process monitor
- Loggers
- Device and seed management