Migration to New Tensor + TensorLayout #14364
TT-BrianLiu
announced in
General announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Background
Our current way of creating tensors on device has two main drawbacks:
tt::tt_metal::LegacyShape
andttnn::Shape
, both of which contains information about logical shape and padded shape. In our stack, we mainly treat the padded shape as the actual logical shape and the logical shape as metadata, which is sometimes respected, sometimes not. There is no clear definition of what a created tensor represents.bfloat8
tensor be inrow_major
layout). Storing padding in shape and not having all these concepts consolidated makes it difficult to use and a nightmare to maintain.New Tensor + TensorLayout
We should be thinking about tensor creation in terms of logical shape + concepts like sharding, dtype, layout, and memory config. To this end, we are introducing two new concepts, which we will slowly move the codebase over to use:
tt::tt_metal::LegacyShape
andttnn::Shape
withttnn::SimpleShape
and rename back tottnn::Shape
.Layout
becomes a bit overloaded, since we have newTensorLayout
(described above) and oldLayout
(ie. just tiny tiles, regular 32x32 tiles, or row_major). We namedTensorLayout
differently fromLayout
for now, but there could be more appropriate names for them in the future (eg.MemoryLayout
/Layout
+PageConfig
).alignment
is just an easier way for users to specify padding. For example, majority of padding is to pad up to nearest 32 to make height and width of a tensor tilizable (eg. 2 padded to 32 with 30 padding, 33 padded to 64 with 31 padding). Instead of user specifying the exact padding needed, it is far more convenient to say: "Please align these dims to the nearest 32". (eg. 2 padded to 32 or 33 padded to 64 are both just alignment of 32). It more accurately captures what you are trying to do.Sharding + Alignment
An interesting point is how sharding interacts with alignment and it largely boils down to: "Does alignment happen before or after sharding?" and we have decided that it MUST happen after sharding:
A side note (if you want to work through some examples of alignment):
Tensor Creation
With new
TensorLayout
, this is how we envision users to create tensors:TensorLayout
.aligned_shard_height * number_of_shards_along_height
aligned_shard_width * number_of_shards_along_width
physical_height
*physical_width
Example in Resnet
Let's go through an example in Resnet where a user has an logical shape
[56, 56, 30]
that they want to height shard into 64 pieces of[49, 30]
shards and the tensor is in TILE layout (meaning each shard is tilized).[56, 56, 30]
[3136, 30]
for sharding[49, 30]
cuts logical 2D shape into 64 pieces (no partial shards at the end here)[49, 30]
is aligned to[64, 32]
. This alignment can be user provided or automatically inferred based onTILE
layout (assuming full32x32
tiles here).Implications
With this new flow, we will be able to support alignment of shards, but it also has some important implications on what you CANNOT do now:
[56, 56, 30]
-> cut into 64 x[49, 30]
shards, then padded up to 64 x[64, 32]
has a 2D physical shape of 4096 x 32 but has no 3D representation. The closest representation could be something like a reshape into[64, 49, 30]
->[64, 49[64], 30[32]]
, but you are talking about a different tensor here.[1, 4[32], 1[32], 32]
is not supported because you cannot describe this as a logical shape + some 2D alignment.transpose_hc
and input is[1, 1[32], 4[32], 32]
, the output would request a tensor like[1, 4[32], 1[32], 32]
, because the OP fundamentally treats the input as[1, 32, 32, 32]
and the actual logical shape is essentially just meta-data... Ignoring how inconsistent this is, the performance is sub-optimal to begin with, since we are doing unnecessary copies and re-tilize to produce data that we do not even need. Today, we actually end up stripping the extra padding along the C after the op anyways.By imposing these restrictions, we hope to slowly transition our stack to only work with logical shape, which is the logical (pun intended 💩) thing to do. If an OP does not work with original logical shape, then it does not work. Simple as that.
@TT-BrianLiu @ayerofieiev-tt @sminakov-tt On our side, we will migrate tensor infra to new
TensorLayout
, but on OPs side, we will start requesting ops to transition to use logical shape. A very trackable outcome (but probably not so easily achieved) is to remove all references to padded shape in the form of:get_padded_shape()
:tt::tt_metal::LegacyShape
orttnn::Shape
(note: default accessor fromtt::tt_metal::LegacyShape
returns you padded dim):Beta Was this translation helpful? Give feedback.
All reactions