Skip to content

v1.10

Latest
Compare
Choose a tag to compare
@ptrendx ptrendx released this 11 Sep 21:40
· 64 commits to main since this release

Release Notes – Release 1.10

Key Features and Enhancements

  • [pyTorch] Added an option to use keyword arguments with CUDA graphs.
  • [pyTorch] Implemented a new load-balanced offloading algorithm to utilize the CPU/GPU interconnect bandwidth to the maximum extent.
  • [pyTorch] Added support for multi-latent attention.
  • [pyTorch] Added additional documentation, scripts, and benchmarks for the attention backend.�
  • [pyTorch] Added context-parallel implementation with KV allgather for causal attention.
  • [pyTorch] Added support for data type casting in the fused Adam kernel.
  • [pyTorch] Added arguments for cumulative and maximum sequence lengths to the TransformerLayer and MultiheadAttention APIs.
  • [pyTorch] Added support for padding mask in unfused backend for dot product attention.
  • [pyTorch] Expanded operation support in the fusion API (transformer_engine.pytorch.ops).
  • [pyTorch] Made several improvements to reduce the amount CPU overhead during execution.
  • [PaddlePaddle] Added an option to run dot product attention deterministically.
  • [JAX] Added support for non-deterministic algorithms in the CUDNN flash attention backend for improved performance.

Fixed Issues

  • [pyTorch] Fixed miscellaneous bugs in communication-gemm overlap with userbuffers.
  • [pyTorch] Removed an additional copy of weights stored when using CPU offloading.
  • [pyTorch] Fixed a crash when running non-causal training with context parallelism.
  • [pyTorch] Fixed the calculation of tensor parallel size when using MQA/GQA.
  • [pyTorch] Fixed a crash when using context parallelism with the THD format.
  • [pyTorch] Fixed a crash in CUDA graphs when skipping warm-up iterations.
  • [pyTorch] Fixed a bug in TransformerLayer for the cross attention case where arguments were incorrectly propagated to DotProductAttention.
  • [C] Hid arbitrary symbols exposed globally in the shared object in order to avoid symbol conflict errors, which could cause a crash during library loading and imports.

Known Issues in This Release

There are no known issues in this release.

Breaking Changes in This Release

There are no breaking changes in this release.

Deprecated Features

There are no deprecated features in this release.