-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[refurb
] implement hardcoded-string-charset
(FURB156)
#13530
base: main
Are you sure you want to change the base?
Conversation
c4cac13
to
40149a1
Compare
|
code | total | + violation | - violation | + fix | - fix |
---|---|---|---|---|---|
FURB156 | 3 | 3 | 0 | 0 | 0 |
crates/ruff_linter/src/rules/refurb/rules/hardcoded_string_charset.rs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. This looks great.
I left a few smaller comments. It would also be great to add some documentation to Charset
and bitset
, considering that the representations are non-trivial and make some assumptions about how they're used.
crates/ruff_linter/src/rules/refurb/rules/hardcoded_string_charset.rs
Outdated
Show resolved
Hide resolved
crates/ruff_linter/src/rules/refurb/rules/hardcoded_string_charset.rs
Outdated
Show resolved
Hide resolved
crates/ruff_linter/src/rules/refurb/rules/hardcoded_string_charset.rs
Outdated
Show resolved
Hide resolved
crates/ruff_linter/src/rules/refurb/rules/hardcoded_string_charset.rs
Outdated
Show resolved
Hide resolved
40149a1
to
a4a585d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks for following up. Let's add a test for when the expression is parenthesized to verify that the fix is correct (if not, take a look at parenthesized_range
.
I also think that we may be able to remove the factory methods on AsciiCharSet
and reduce them to just one.
|
||
_ = "" in "1234567890" | ||
_ = "" in "12345670" | ||
_ = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test for a case where the string is parenthesized
_ = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c' | |
_ = ( | |
'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c' | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
066e700
I added the test and "took a look" on parenthesized_range
, but its usage was really not intuitive for me in that special case. Could I somehow simplify "getting the parent" here?
fn from_bytes(bytes: &[u8]) -> Option<Self> { | ||
bytes | ||
.iter() | ||
.try_fold(0, |acc, &byte| byte.is_ascii().then(|| acc | (1 << byte))) | ||
.map(Self) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the motivation for having both from_bytes
and from_bytes_const
? Aren't both returning None
if the string contains any non ascii character?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to stick to using a const
here (another option, I would like to avoid, is runtime evaluation under "OnceLock"):
const KNOWN_NAMED_CHARSETS: [NamedCharset; 9] = [ |
So I need
from_bytes
to be a const fn
, but unfortunately currently "in stable" it cannot have a clean implementation like this:ruff/crates/ruff_linter/src/rules/refurb/rules/hardcoded_string_charset.rs
Lines 62 to 65 in 066e700
bytes | |
.iter() | |
.try_fold(0, |acc, &byte| byte.is_ascii().then(|| acc | (1 << byte))) | |
.map(Self) |
So I added the dirty implementation (which can be const fn
) under another name from_bytes_const
:
ruff/crates/ruff_linter/src/rules/refurb/rules/hardcoded_string_charset.rs
Lines 70 to 85 in 066e700
const fn from_bytes_const(bytes: &[u8]) -> Option<Self> { | |
// TODO: remove in favor of [`Self::from_bytes`], when its implementation will be | |
// supported in `const` fn | |
// - https://github.com/rust-lang/rust/issues/67792 | |
// - https://github.com/rust-lang/rust/issues/87575 | |
let mut bitset = 0; | |
let mut i = 0; | |
while i < bytes.len() { | |
if !bytes[i].is_ascii() { | |
return None; | |
} | |
bitset |= 1 << bytes[i]; | |
i += 1; | |
} | |
Some(Self(bitset)) | |
} |
And there is a TODO to remove it, when stable supports const fn
like usual from_bytes
.
The third factory is about the usage of Option::unwrap
in const fn
. Currently is not possible (wait for rust-lang/rust#67441). When it will be shipped, we can write
ascii_char_set: AsciiCharSet::from_bytes_const(bytes).unwrap(),
instead of
ascii_char_set: AsciiCharSet::from_bytes_const_unwrap(bytes), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like the const version but do we need the non const version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The const version is non-idiomatic and will surprise a reader.
Having this non-idiomatic version in runtime could have some interesting behavior of optimization part of compiler.
While there is a difference in generated code ( see https://godbolt.org/z/53T3z86f8 ), I assume it is a good default decision to have two implemenation untill the const fn
supports idiomatic implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be fine with keeping the code simpler and using OnceLock
if that works personally.
Otherwise, if the non-const version isn't being used, I'd drop it. If you're worried about surprising the reader, then you could write a short comment. (Although I don't think it's that surprising. If you've ever tried to write a const fn before, you'll know that you're pretty limited in the tools you have.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
b9b3b64
Ok. AsciiCharSet
interface was simplified to the direction of only const
implementation.
Also:
from_bytes_const_unwrap
was inlined to localize the story ofOption::unwrap
inconst
context.- slightly change the
TODO
comments.
b9b3b64
to
aa0a81a
Compare
Summary
Implement
hardcoded-string-charset
(FURB156)See:
Test Plan
cargo test