Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(core): start support for hex numbers #553

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

hippietrail
Copy link
Contributor

towards #543

based on the existing lex_number which cuts off when the digits finish, which is fine because a unit of measurement etc might immediately follow. But with a hex number a non-hex letter might follow and not be flagged if it's a single letter. Also if it's very wrong, the start won't be flagged if it's 0x and at least one hex digit. Examples will help illustrate:

image

So just a draft PR to solicit thoughts and mods.

@hippietrail hippietrail marked this pull request as ready for review February 1, 2025 05:34
@hippietrail
Copy link
Contributor Author

I'm still figuring out GitHub. I made this PR a draft because it's not ready to be merged.

But I think "draft" also means "not ready for review" but I specifically want it reviewed to get feedback on what it should do with text "stuck to" the hex that is not hex. And perhaps how to change the code to handle comply to what's decided. Especially if we want to flag such "hex with appendages".

If my understanding of draft status for a PR is still wrong, please let me know.

Copy link
Collaborator

@elijah-potter elijah-potter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hippietrail, the code looks good. Would you mind adding some test cases under harper-core/tests/test_sources?

I would add a couple Markdown files that should be considered "clean" with this lexer added. Make sure you actually import them in harper-core/tests/run_tests.rs.

@hippietrail
Copy link
Contributor Author

hippietrail commented Feb 1, 2025

@hippietrail, the code looks good. Would you mind adding some test cases under harper-core/tests/test_sources?

I would add a couple Markdown files that should be considered "clean" with this lexer added. Make sure you actually import them in harper-core/tests/run_tests.rs.

Are you sure? Did you see my concerns about what to do regarding non-hex text immediately after the hex with no space etc between?


After some thought I came to grok the problem more completely and refactored the code, now written in a more old-school way that I'm used to doing lexing which makes it easier to think about when it gets tricky.

I added positive and negative tests each in a loop. Let me know if this is a good or bad idea.

Clear behaviour on non-hex appended to valid hex.
Added test cases.

Note that the `lex_number` will match the `0` of hex with appended non-hex though. Sould be vary rare but would be counterinutive for the user.
}

#[test]
fn lexes_bad_hex() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename this to does_not_lex_bad_hex. Even better would be to split these two tests into many smaller ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went with the "even better" option.

@ccoVeille
Copy link
Contributor

I'm worried about possible reporting of leet speak like 0xf0rd

Then it's a matter of balance, what is the more likely to be found in text? Hexadecimal or leet speak ?

I updated #555 accordingly.

I would prefer hex rules to be merged after leet speak is handled #598

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants