Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Snippets][CPU] Moved N_tail processing to the end in BrgemmCopyBKernel #28664

Open
wants to merge 3 commits into from

Conversation

a-sidorova
Copy link
Contributor

@a-sidorova a-sidorova commented Jan 24, 2025

Details:

  • The performance experiments (see the mentioned ticket please) show that N_Tail processing should be at the end of BrgemmCopyBKernel. The current PR moves tail processing from the beginning to the end in kernel

Tickets:

@a-sidorova a-sidorova added this to the 2025.1 milestone Jan 24, 2025
@github-actions github-actions bot added the category: CPU OpenVINO CPU plugin label Jan 24, 2025
@a-sidorova a-sidorova force-pushed the feature/snippets/optimized_brgemm_copy_b_kernel_tail branch from 9a47d46 to f5291d8 Compare January 27, 2025 05:53
@a-sidorova a-sidorova marked this pull request as ready for review January 28, 2025 06:25
@a-sidorova a-sidorova requested review from a team as code owners January 28, 2025 06:25
Comment on lines 79 to 80
inline T compute_LDB(T n_block, const ov::element::Type& precision) {
return compute_repacked_n_dim(n_block, precision);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need 2 different functions that do the same thing?
Should we replace all compute_LDB calls with compute_repacked_n_dim then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left the both them just to split logic into get_LDB and get_repacked_N by sense.
But the implementation is the same.

Anyway, I replaced compite_LDB with compute_repacked_n_dim in 15dadd5

const auto& precision = parent_expr->get_node()->get_input_element_type(0);
m_allocation_size = std::max(n_blk, compute_inner_n_block(precision));
}
m_allocation_size = compute_repacked_n_dim(n_blk, precision);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice that we now reuse the dynamic values handling from ov::snippets::utils::rnd_up, and don't have to replicate this logic anywhere else.

@a-sidorova a-sidorova force-pushed the feature/snippets/optimized_brgemm_copy_b_kernel_tail branch from c64f34f to 15dadd5 Compare January 31, 2025 05:09
Copy link
Contributor

@IvanNovoselov IvanNovoselov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@a-sidorova a-sidorova force-pushed the feature/snippets/optimized_brgemm_copy_b_kernel_tail branch from 15dadd5 to 6ea5aff Compare February 4, 2025 07:13
@IvanNovoselov IvanNovoselov added this pull request to the merge queue Feb 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: CPU OpenVINO CPU plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants