Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#10548: Support tile layout for width/height-sharded concat #13744

Merged
merged 3 commits into from
Oct 22, 2024

Conversation

jerrysky3
Copy link
Contributor

@jerrysky3 jerrysky3 commented Oct 11, 2024

Ticket

#10548

Problem description

Change kernel to support concat on both width and height-sharded tensors in tile layout. The kernel supports the following cases:

  • Height-sharded width concat in row-major/tile layout
  • Width-sharded height concat in row-major/tile layout

For now it's only used to concat > 2 tensors. Two tensor concatenation is currently handled by a special unrolled kernel due to performance issue with runtime args. This new kernel can be unrolled and replace it if needed in the future

What's changed

  • Rename and change s2s_rm_concat_multi_core to s2s_concat_multi_core to support tile layout besides the row-major layout
  • Rename reader_s2s_rm_tensor_concat.cpp to reader_s2s_tensor_concat.cpp

Checklist

@jerrysky3
Copy link
Contributor Author

Device performance regression (https://github.com/tenstorrent/tt-metal/actions/runs/11396055725) is failing with:

AssertionError: Some model(s) AVG DEVICE KERNEL SAMPLES/S are faster than expected, see above for details. {'AVG DEVICE KERNEL SAMPLES/S': [('ttnn_distilbert8_distilbert-base-uncased-distilled-squad', 90.4624, 20.394), ('ttnn_functional_ttnn_vgg11_1_', 270.5544, 108.871), ('ttnn_functional_ttnn_vgg16_1_', 195.0345, 95.378)]}

However it is currently also happening on the main branch: https://github.com/tenstorrent/tt-metal/actions/runs/11399044245/job/31717674624#step:9:2739

Copy link
Contributor

@sjameelTT sjameelTT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, nice work.

@sjameelTT
Copy link
Contributor

Device performance regression (https://github.com/tenstorrent/tt-metal/actions/runs/11396055725) is failing with:

AssertionError: Some model(s) AVG DEVICE KERNEL SAMPLES/S are faster than expected, see above for details. {'AVG DEVICE KERNEL SAMPLES/S': [('ttnn_distilbert8_distilbert-base-uncased-distilled-squad', 90.4624, 20.394), ('ttnn_functional_ttnn_vgg11_1_', 270.5544, 108.871), ('ttnn_functional_ttnn_vgg16_1_', 195.0345, 95.378)]}

However it is currently also happening on the main branch: https://github.com/tenstorrent/tt-metal/actions/runs/11399044245/job/31717674624#step:9:2739

Yeah this was actually me, perf numbers have improved on a bunch of models but I didn't update them with my change since we fail on very big improvements too (to prevent the numbers from getting stale).

@jerrysky3
Copy link
Contributor Author

Hi @ayerofieiev-tt , this is another kernel ramp-up task that needs a code owner review and merge. Thanks!

@ayerofieiev-tt ayerofieiev-tt merged commit 0326577 into main Oct 22, 2024
7 checks passed
@ayerofieiev-tt ayerofieiev-tt deleted the jerrysky3/i-10548 branch October 22, 2024 00:06
ct-clmsn pushed a commit to ct-clmsn/tt-metal that referenced this pull request Nov 12, 2024
tenstorrent#13744)

Co-authored-by: Artem Yerofieiev <169092593+ayerofieiev-tt@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants