-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#10548: Support tile layout for width/height-sharded concat #13744
Conversation
6ffbbda
to
19772ed
Compare
Device performance regression (https://github.com/tenstorrent/tt-metal/actions/runs/11396055725) is failing with:
However it is currently also happening on the main branch: https://github.com/tenstorrent/tt-metal/actions/runs/11399044245/job/31717674624#step:9:2739 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, nice work.
Yeah this was actually me, perf numbers have improved on a bunch of models but I didn't update them with my change since we fail on very big improvements too (to prevent the numbers from getting stale). |
Hi @ayerofieiev-tt , this is another kernel ramp-up task that needs a code owner review and merge. Thanks! |
tenstorrent#13744) Co-authored-by: Artem Yerofieiev <169092593+ayerofieiev-tt@users.noreply.github.com>
Ticket
#10548
Problem description
Change kernel to support concat on both width and height-sharded tensors in tile layout. The kernel supports the following cases:
For now it's only used to concat > 2 tensors. Two tensor concatenation is currently handled by a special unrolled kernel due to performance issue with runtime args. This new kernel can be unrolled and replace it if needed in the future
What's changed
s2s_rm_concat_multi_core
tos2s_concat_multi_core
to support tile layout besides the row-major layoutreader_s2s_rm_tensor_concat.cpp
toreader_s2s_tensor_concat.cpp
Checklist