Move the sharding axes from dimensions that need replication to batch dimensions, such that we replace an all-gather
with an all-to-all
.
#62291
Loading