Skip to content

PyTorch/XLA 2.0 release

Compare
Choose a tag to compare
@miladm miladm released this 12 Aug 07:23
· 2024 commits to master since this release
500e1c2

Cloud TPUs now support the PyTorch 2.0 release, via PyTorch/XLA integration. On top of the underlying improvements and bug fixes in PyTorch's 2.0 release, this release introduces several features, and PyTorch/XLA specific bug fixes.

Beta Features

PJRT runtime

  • Checkout our newest document; PjRt is the default runtime in 2.0.
  • New Implementation of xm.rendezvous with XLA collective communication which scales better (#4181)
  • New PJRT TPU backend through the C-API (#4077)
  • Use PJRT to default if no runtime is configured (#4599)
  • Experimental support for torch.distributed and DDP on TPU v2 and v3 (#4520)

FSDP

  • Add auto_wrap_policy into XLA FSDP for automatic wrapping (#4318)

Stable Features

Lazy Tensor Core Migration

  • Migration is completed, checkout this dev discussion for more detail.
  • Naively inherits LazyTensor (#4271)
  • Adopt even more LazyTensor interfaces (#4317)
  • Introduce XLAGraphExecutor (#4270)
  • Inherits LazyGraphExecutor (#4296)
  • Adopt more LazyGraphExecutor virtual interfaces (#4314)
  • Rollback to use xla::Shape instead of torch::lazy::Shape (#4111)
  • Use TORCH_LAZY_COUNTER/METRIC (#4208)

Improvements & Additions

  • Add an option to increase the worker thread efficiency for data loading (#4727)
  • Improve numerical stability of torch.sigmoid (#4311)
  • Add an api to clear counter and metrics (#4109)
  • Add met.short_metrics_report to display more concise metrics report (#4148)
  • Document environment variables (#4273)
  • Op Lowering
    • _linalg_svd (#4537)
    • Upsample_bilinear2d with scale (#4464)

Experimental Features

TorchDynamo (torch.compile) support

  • Checkout our newest doc.
  • Dynamo bridge python binding (#4119)
  • Dynamo bridge backend implementation (#4523)
  • Training optimization: make execution async (#4425)
  • Training optimization: reduce graph execution per step (#4523)

PyTorch/XLA GSPMD on single host

  • Preserve parameter sharding with sharded data placeholder (#4721)
  • Transfer shards from server to host (#4508)
  • Store the sharding annotation within XLATensor(#4390)
  • Use d2d replication for more efficient input sharding (#4336)
  • Mesh to support custom device order. (#4162)
  • Introduce virtual SPMD device to avoid unpartitioned data transfer (#4091)

Ongoing development

Ongoing Dynamic Shape implementation

  • Implement missing XLASymNodeImpl::Sub (#4551)
  • Make empty_symint support dynamism. (#4550)
  • Add dynamic shape support to SigmoidBackward (#4322)
  • Add a forward pass NN model with dynamism test (#4256)

Ongoing SPMD multi host execution (#4573)

Bug fixes & improvements

  • Support int as index type (#4602)
  • Only alias inputs and outputs when force_ltc_sync == True (#4575)
  • Fix race condition between execution and buffer tear down on GPU when using bfc_allocator (#4542)
  • Release the GIL during TransferFromServer (#4504)
  • Fix type annotations in FSDP (#4371)