Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[refurb] implement hardcoded-string-charset (FURB156) #13530

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

alex-700
Copy link
Contributor

Summary

Implement hardcoded-string-charset (FURB156)
See:

Test Plan

cargo test

Copy link
Contributor

github-actions bot commented Sep 26, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+3 -0 violations, +0 -0 fixes in 2 projects; 52 projects unchanged)

apache/airflow (+1 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

+ dev/breeze/src/airflow_breeze/utils/packages.py:433:48: FURB156 [*] Use of hardcoded string charset

bokeh/bokeh (+2 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

+ examples/reference/models/Text.py:9:5: FURB156 [*] Use of hardcoded string charset
+ tests/unit/bokeh/util/test_token.py:69:20: FURB156 [*] Use of hardcoded string charset

Changes by rule (1 rules affected)

code total + violation - violation + fix - fix
FURB156 3 3 0 0 0

@zanieb zanieb self-assigned this Sep 26, 2024
@MichaReiser MichaReiser added rule Implementing or modifying a lint rule preview Related to preview mode features labels Sep 27, 2024
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This looks great.

I left a few smaller comments. It would also be great to add some documentation to Charset and bitset, considering that the representations are non-trivial and make some assumptions about how they're used.

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for following up. Let's add a test for when the expression is parenthesized to verify that the fix is correct (if not, take a look at parenthesized_range.

I also think that we may be able to remove the factory methods on AsciiCharSet and reduce them to just one.


_ = "" in "1234567890"
_ = "" in "12345670"
_ = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'
Copy link
Member

@MichaReiser MichaReiser Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test for a case where the string is parenthesized

Suggested change
_ = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'
_ = (
'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

066e700
I added the test and "took a look" on parenthesized_range, but its usage was really not intuitive for me in that special case. Could I somehow simplify "getting the parent" here?

Comment on lines 60 to 65
fn from_bytes(bytes: &[u8]) -> Option<Self> {
bytes
.iter()
.try_fold(0, |acc, &byte| byte.is_ascii().then(|| acc | (1 << byte)))
.map(Self)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation for having both from_bytes and from_bytes_const? Aren't both returning None if the string contains any non ascii character?

Copy link
Contributor Author

@alex-700 alex-700 Sep 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to stick to using a const here (another option, I would like to avoid, is runtime evaluation under "OnceLock"):

const KNOWN_NAMED_CHARSETS: [NamedCharset; 9] = [

So I need from_bytes to be a const fn, but unfortunately currently "in stable" it cannot have a clean implementation like this:
bytes
.iter()
.try_fold(0, |acc, &byte| byte.is_ascii().then(|| acc | (1 << byte)))
.map(Self)

So I added the dirty implementation (which can be const fn) under another name from_bytes_const:

const fn from_bytes_const(bytes: &[u8]) -> Option<Self> {
// TODO: remove in favor of [`Self::from_bytes`], when its implementation will be
// supported in `const` fn
// - https://github.com/rust-lang/rust/issues/67792
// - https://github.com/rust-lang/rust/issues/87575
let mut bitset = 0;
let mut i = 0;
while i < bytes.len() {
if !bytes[i].is_ascii() {
return None;
}
bitset |= 1 << bytes[i];
i += 1;
}
Some(Self(bitset))
}

And there is a TODO to remove it, when stable supports const fn like usual from_bytes.

The third factory is about the usage of Option::unwrap in const fn. Currently is not possible (wait for rust-lang/rust#67441). When it will be shipped, we can write

            ascii_char_set: AsciiCharSet::from_bytes_const(bytes).unwrap(),

instead of

ascii_char_set: AsciiCharSet::from_bytes_const_unwrap(bytes),

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like the const version but do we need the non const version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The const version is non-idiomatic and will surprise a reader.

Having this non-idiomatic version in runtime could have some interesting behavior of optimization part of compiler.
While there is a difference in generated code ( see https://godbolt.org/z/53T3z86f8 ), I assume it is a good default decision to have two implemenation untill the const fn supports idiomatic implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be fine with keeping the code simpler and using OnceLock if that works personally.

Otherwise, if the non-const version isn't being used, I'd drop it. If you're worried about surprising the reader, then you could write a short comment. (Although I don't think it's that surprising. If you've ever tried to write a const fn before, you'll know that you're pretty limited in the tools you have.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b9b3b64
Ok. AsciiCharSet interface was simplified to the direction of only const implementation.
Also:

  • from_bytes_const_unwrap was inlined to localize the story of Option::unwrap in const context.
  • slightly change the TODO comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preview Related to preview mode features rule Implementing or modifying a lint rule
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants