Migration to New Tensor + TensorLayout #14364

TT-BrianLiu · 2024-10-28T16:52:24Z

TT-BrianLiu
Oct 28, 2024
Collaborator

Background

Our current way of creating tensors on device has two main drawbacks:

Error prone for usage: We have two shapes in our stack, tt::tt_metal::LegacyShape and ttnn::Shape, both of which contains information about logical shape and padded shape. In our stack, we mainly treat the padded shape as the actual logical shape and the logical shape as metadata, which is sometimes respected, sometimes not. There is no clear definition of what a created tensor represents.
Not descriptive enough: We are limited in what type of tensors we can describe on device, especially when it comes to sharding. Part of this comes from shape owning padding and thus forcing padding to be aligned to the rank of the logical shape. However, sharding fundamentally operates on a flattened 2D tensor (at least for today...) and describing padding in terms of logical shape does not allow for alignment at the 2D shard level (examples down below). Moreover, we have concepts such as datatype, layout, and memory config all stored separately, where in reality they all interact with each other in some way or another (eg. You cannot have a bfloat8 tensor be in row_major layout). Storing padding in shape and not having all these concepts consolidated makes it difficult to use and a nightmare to maintain.

New Tensor + TensorLayout

We should be thinking about tensor creation in terms of logical shape + concepts like sharding, dtype, layout, and memory config. To this end, we are introducing two new concepts, which we will slowly move the codebase over to use:

ttnn::SimpleShape: A new shape that purely describes some ND shape (with some functions/accessors provided for convenience). The shape can be used to describe either a logical or padded shape. It will be up to user how they want to use this new shape, but what it represents will be very explicit.
- IMPORTANT: The name of this new shape is temporary. Our end goal is to replace all usages of tt::tt_metal::LegacyShape and ttnn::Shape with ttnn::SimpleShape and rename back to ttnn::Shape.
tt::tt_metal::TensorLayout: A new struct (a sort of memory layout engine) that describes datatype, layout (eg. TILE vs. RM), memory (eg. DRAM vs. L1), sharding (ie. how you want to cut your logical shape), and alignment (ie. how much you want to pad your tensor). Existing code can be greatly cleaned up by moving all logic involving these concepts under one struct, where their interactions are made explicit. It will allow us to describe more complicated layouts and make more manual padding/alignment automatic.
- A note on semantics:
  - Layout becomes a bit overloaded, since we have new TensorLayout (described above) and old Layout (ie. just tiny tiles, regular 32x32 tiles, or row_major). We named TensorLayout differently from Layout for now, but there could be more appropriate names for them in the future (eg. MemoryLayout/Layout + PageConfig).
  - alignment is just an easier way for users to specify padding. For example, majority of padding is to pad up to nearest 32 to make height and width of a tensor tilizable (eg. 2 padded to 32 with 30 padding, 33 padded to 64 with 31 padding). Instead of user specifying the exact padding needed, it is far more convenient to say: "Please align these dims to the nearest 32". (eg. 2 padded to 32 or 33 padded to 64 are both just alignment of 32). It more accurately captures what you are trying to do.

Sharding + Alignment

An interesting point is how sharding interacts with alignment and it largely boils down to: "Does alignment happen before or after sharding?" and we have decided that it MUST happen after sharding:

If alignment is before sharding, it forces our shard shapes to conform to restrictions based on layout. For example, if you want to have a sharded tensor with TILE layout, each shard must itself be tilizable. This goes against a use case in Resnet where we want to specifically support this behaviour (example below).
"Why not do both?": Having two layers of alignment adds a lot of complexity in terms of implementation and, in cases where we are not just doing redundant/equivalent padding, the resulting tensor after alignment -> sharding -> alignment is not practical. Some existing cases that cannot be expressed without alignment before sharding are to support sub-optimal hacks/workarounds in OPs, while the legitimate use cases of padded shape (eg. tile alignment, allocate more space per shard for downstream OPs in Resnet, etc...) are more conveniently expressed with alignment after sharding.

A side note (if you want to work through some examples of alignment):

If you align an original input logical shape, alignment has to match the rank of logical shape
If you align a shard, alignment is just 2D since our shards are 2D today.

Tensor Creation

With new TensorLayout, this is how we envision users to create tensors:

Logical ND tensor shape with no concept of padding whatsoever
Logical ND shape is flattened into a logical 2D shape where all dims other than last dim is collapsed as height
Logical 2D shard shape cuts up the resulting 2D tensor shape
- Here, we support partial shards along last row/col/or both at the end if necessary
2D alignment on each 2D shard
- Alignment can be arbitrary size. In general, it can default to tile sizes for tiled layout or some width padding for row_major tensors based on some dtype, but you can also align to some larger size as long as they are compatible with the rest of TensorLayout.
- This results in some 2D physical shape that can be calculated based on:
  - physical height: aligned_shard_height * number_of_shards_along_height
  - physical width: aligned_shard_width * number_of_shards_along_width
  - total physical size: physical_height * physical_width

Example in Resnet

Let's go through an example in Resnet where a user has an logical shape [56, 56, 30] that they want to height shard into 64 pieces of [49, 30] shards and the tensor is in TILE layout (meaning each shard is tilized).

Logical tensor shape: [56, 56, 30]
Logical 2D shape: Equivalent to [3136, 30] for sharding
Logical 2D shard: [49, 30] cuts logical 2D shape into 64 pieces (no partial shards at the end here)
2D alignment on shard: [49, 30] is aligned to [64, 32]. This alignment can be user provided or automatically inferred based on TILE layout (assuming full 32x32 tiles here).
- physical height: 64 x 64 = 4096
- physical width: 32 x 1 = 32
- total physical size: 4096 x 32 = 131,072

Implications

With this new flow, we will be able to support alignment of shards, but it also has some important implications on what you CANNOT do now:

NO LONGER have a concept of padded shape: We are not just moving away from using padded shape to specify custom layout. Instead, we want to deprecate the concept of padded shape altogether, because in general the resulting physical tensor does not have a padded representation.
- In the above example, [56, 56, 30] -> cut into 64 x [49, 30] shards, then padded up to 64 x [64, 32] has a 2D physical shape of 4096 x 32 but has no 3D representation. The closest representation could be something like a reshape into [64, 49, 30] -> [64, 49[64], 30[32]], but you are talking about a different tensor here.
NO LONGER support padding along non height and width: This will be a class of tensors that we will no longer consider valid.
- For example, [1, 4[32], 1[32], 32] is not supported because you cannot describe this as a logical shape + some 2D alignment.
- In ops like transpose_hc and input is [1, 1[32], 4[32], 32], the output would request a tensor like [1, 4[32], 1[32], 32], because the OP fundamentally treats the input as [1, 32, 32, 32] and the actual logical shape is essentially just meta-data... Ignoring how inconsistent this is, the performance is sub-optimal to begin with, since we are doing unnecessary copies and re-tilize to produce data that we do not even need. Today, we actually end up stripping the extra padding along the C after the op anyways.

By imposing these restrictions, we hope to slowly transition our stack to only work with logical shape, which is the logical (pun intended 💩) thing to do. If an OP does not work with original logical shape, then it does not work. Simple as that.

@TT-BrianLiu @ayerofieiev-tt @sminakov-tt On our side, we will migrate tensor infra to new TensorLayout, but on OPs side, we will start requesting ops to transition to use logical shape. A very trackable outcome (but probably not so easily achieved) is to remove all references to padded shape in the form of:

Getting padded shape from the new get_padded_shape():

Querying padded shape from either tt::tt_metal::LegacyShape or ttnn::Shape (note: default accessor from tt::tt_metal::LegacyShape returns you padded dim):

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migration to New Tensor + TensorLayout #14364

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Migration to New Tensor + TensorLayout #14364

TT-BrianLiu Oct 28, 2024 Collaborator

Background

New Tensor + TensorLayout

Sharding + Alignment

Tensor Creation

Example in Resnet

Implications

Replies: 0 comments

TT-BrianLiu
Oct 28, 2024
Collaborator