-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Snippets][CPU] Moved N_tail processing to the end in BrgemmCopyBKernel #28664
[Snippets][CPU] Moved N_tail processing to the end in BrgemmCopyBKernel #28664
Conversation
9a47d46
to
f5291d8
Compare
...c/transformations/snippets/x64/pass/lowered/expressions/brgemm_copy_b_buffer_expressions.cpp
Outdated
Show resolved
Hide resolved
inline T compute_LDB(T n_block, const ov::element::Type& precision) { | ||
return compute_repacked_n_dim(n_block, precision); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need 2 different functions that do the same thing?
Should we replace all compute_LDB
calls with compute_repacked_n_dim
then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left the both them just to split logic into get_LDB
and get_repacked_N
by sense.
But the implementation is the same.
Anyway, I replaced compite_LDB
with compute_repacked_n_dim
in 15dadd5
const auto& precision = parent_expr->get_node()->get_input_element_type(0); | ||
m_allocation_size = std::max(n_blk, compute_inner_n_block(precision)); | ||
} | ||
m_allocation_size = compute_repacked_n_dim(n_blk, precision); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's nice that we now reuse the dynamic values handling from ov::snippets::utils::rnd_up
, and don't have to replicate this logic anywhere else.
c64f34f
to
15dadd5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
15dadd5
to
6ea5aff
Compare
Details:
N_Tail
processing should be at the end ofBrgemmCopyBKernel
. The current PR moves tail processing from the beginning to the end in kernelTickets: