PyTorch/XLA 2.0 release
Cloud TPUs now support the PyTorch 2.0 release, via PyTorch/XLA integration. On top of the underlying improvements and bug fixes in PyTorch's 2.0 release, this release introduces several features, and PyTorch/XLA specific bug fixes.
Beta Features
PJRT runtime
- Checkout our newest document; PjRt is the default runtime in 2.0.
- New Implementation of xm.rendezvous with XLA collective communication which scales better (#4181)
- New PJRT TPU backend through the C-API (#4077)
- Use PJRT to default if no runtime is configured (#4599)
- Experimental support for torch.distributed and DDP on TPU v2 and v3 (#4520)
FSDP
- Add auto_wrap_policy into XLA FSDP for automatic wrapping (#4318)
Stable Features
Lazy Tensor Core Migration
- Migration is completed, checkout this dev discussion for more detail.
- Naively inherits LazyTensor (#4271)
- Adopt even more LazyTensor interfaces (#4317)
- Introduce XLAGraphExecutor (#4270)
- Inherits LazyGraphExecutor (#4296)
- Adopt more LazyGraphExecutor virtual interfaces (#4314)
- Rollback to use xla::Shape instead of torch::lazy::Shape (#4111)
- Use TORCH_LAZY_COUNTER/METRIC (#4208)
Improvements & Additions
- Add an option to increase the worker thread efficiency for data loading (#4727)
- Improve numerical stability of torch.sigmoid (#4311)
- Add an api to clear counter and metrics (#4109)
- Add met.short_metrics_report to display more concise metrics report (#4148)
- Document environment variables (#4273)
- Op Lowering
Experimental Features
TorchDynamo (torch.compile) support
- Checkout our newest doc.
- Dynamo bridge python binding (#4119)
- Dynamo bridge backend implementation (#4523)
- Training optimization: make execution async (#4425)
- Training optimization: reduce graph execution per step (#4523)
PyTorch/XLA GSPMD on single host
- Preserve parameter sharding with sharded data placeholder (#4721)
- Transfer shards from server to host (#4508)
- Store the sharding annotation within XLATensor(#4390)
- Use d2d replication for more efficient input sharding (#4336)
- Mesh to support custom device order. (#4162)
- Introduce virtual SPMD device to avoid unpartitioned data transfer (#4091)
Ongoing development
Ongoing Dynamic Shape implementation
- Implement missing
XLASymNodeImpl::Sub
(#4551) - Make empty_symint support dynamism. (#4550)
- Add dynamic shape support to SigmoidBackward (#4322)
- Add a forward pass NN model with dynamism test (#4256)