Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce binary size of levenshtein #60

Merged
merged 1 commit into from
Jan 4, 2024
Merged

reduce binary size of levenshtein #60

merged 1 commit into from
Jan 4, 2024

Conversation

maxbachmann
Copy link
Member

This removes the special handling for an empty first string, which leads to an extra memory allocation in this case. However this reduces the binary size noticeably:

opt-level=3:
961B strsim strsim::levenshtein
1.1KiB strsim strsim::levenshtein

opt-level=3
888B strsim strsim::levenshtein
949B strsim strsim::levenshtein

Copy link
Member

@dguo dguo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. There's just the merge conflict with the changelog.


for (i, a_elem) in a.into_iter().enumerate() {
result = i + 1;
let mut distance_b = i;

for (j, b_elem) in b.into_iter().enumerate() {
let cost = if a_elem == b_elem { 0usize } else { 1usize };
let cost = usize::from(a_elem != b_elem);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised that this change makes a difference. Just for my own learning, do you have an explanation for it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory this can make a difference because it is a branchless version. This can't lead to CPU misprediction + in this case wouldn't require the branching in the binary. E.g. when implementing a SIMD version of algorithms this would be required, since there is no branching for SIMD.

In praxis this doesn't make a difference, since this is optimized by the compiler already anyways. So this line is simply a cleanup recommended by cargo clippy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the binary size difference is only because we do no longer have to special case empty strings.

This removes the special handling for an empty first string,
which leads to an extra memory allocation in this case.
However this reduces the binary size noticeably:

opt-level=3:
961B strsim strsim::levenshtein
1.1KiB strsim strsim::levenshtein

opt-level=3
888B strsim strsim::levenshtein
949B strsim strsim::levenshtein
@maxbachmann maxbachmann merged commit f9eea82 into main Jan 4, 2024
12 checks passed
@maxbachmann maxbachmann deleted the levenshtein branch January 4, 2024 07:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants