Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undetected grammar issues. #90

Open
madushan1000 opened this issue Sep 16, 2024 · 0 comments
Open

Undetected grammar issues. #90

madushan1000 opened this issue Sep 16, 2024 · 0 comments

Comments

@madushan1000
Copy link

I'm evaluating nlprule for a browser plugin. I manged to get it running in the browser with not that much work.
But I noticed that nlprule doesn't detect quite a few errors compared to languagetool. For example, for bellow string

"A sentence with a error in the Hitchhiker's Guide tot he Galaxy.He and I is the dude."
language tool website detects 5 issues.
image

languagetool java library detects only 3(I guess because n-gram data is missing). One of them is spelling mistake.

Potential error at characters 16-17: Use <suggestion>an</suggestion> instead of 'a' if the following word starts with a vowel sound, e.g. 'an article', 'an hour'.
Suggested correction(s): [an]
Potential error at characters 31-41: Possible spelling mistake. 'Hitchhiker' is American English.
Suggested correction(s): [Hitch-hiker]
Potential error at characters 50-56: Did you mean <suggestion>to the</suggestion>?
Suggested correction(s): [to the]

nlprule detected only one.

TYPOS/TOT_HE/0
Did you mean to the?
[ 'to the' ]

I tried building the tokenizer/rules databases from the latest languagetool data(I couldn't figure out how to update the tokenizer model, looks like opennlp doesn't offer their new models in the same format used in nlprule). But that didn't improve things.

Can this be because of the unimplemented rules? Can you give some guidance about how to implement the missing rules? I'm talking about these

[2024-09-16T15:50:55Z INFO  nlprule::compile] Reading common words from data/en/common.txt.
[2024-09-16T15:50:55Z INFO  nlprule::compile] Creating tagger.
[2024-09-16T15:50:56Z INFO  nlprule::compile] Regex cache at data/en/regex_cache.bin is valid.
[2024-09-16T15:50:56Z INFO  nlprule::compile] data/en/chunker.json exists. Building chunker.
[2024-09-16T15:50:56Z INFO  nlprule::compile] data/en/tags/multiwords.txt exists. Building multiword tagger.
[2024-09-16T15:50:56Z INFO  nlprule::compile] Creating tokenizer.
[2024-09-16T15:50:57Z WARN  nlprule::compile::impls] Error constructing Disambiguator: [Rule] feature not implemented: postag not supported for `add`.
[2024-09-16T15:50:57Z INFO  nlprule::compile] Creating grammar rules.
[2024-09-16T15:51:00Z WARN  nlprule::compile::impls] Errors constructing Rules: [
        "[Rule] feature not implemented: postag, postag_regex, postag_replace and text in `match` are not implemented. (n=242)",
        "[Structure] custom: unknown variant `example`, expected one of `token`, `marker`, `or`, `and`, `feature` (n=212)",
        "[Rule] feature not implemented: examples with `type` (i. e. 'triggers_error') are not implemented. (n=43)",
        "[Rule] feature not implemented: rules with no suggestion are not implemented. (n=35)",
        "[Rule] feature not implemented: rules with filter are not implemented. (n=31)",
        "[Structure] custom: unknown field `tags`, expected one of `pattern`, `regexp`, `antipattern`, `message`, `suggestion`, `example`, `id`, `name`, `short`, `url`, `default`, `filter`, `__unused_unifications` (n=11)",
        "[Structure] custom: unknown field `tags`, expected one of `id`, `antipattern`, `default`, `name`, `short`, `url`, `rule` (n=10)",
        "[Structure] custom: unknown field `type`, expected one of `id`, `antipattern`, `default`, `name`, `short`, `url`, `rule` (n=8)",
        "[Structure] custom: unknown field `type`, expected one of `pattern`, `regexp`, `antipattern`, `message`, `suggestion`, `example`, `id`, `name`, `short`, `url`, `default`, `filter`, `__unused_unifications` (n=7)",
        "[Structure] custom: missing field `$value` (n=3)",
        "[Structure] custom: unknown field `raw_pos`, expected `case_sensitive` or `$value` (n=2)",
        "[Rule] regex parse error:\n    (?)id\n     ^\nerror: repetition operator missing expression (n=2)",
        "[Rule] feature not implemented: case conversion preserve not supported. (n=2)",
        "[Structure] custom: unknown field `type`, expected one of `text`, `case_sensitive`, `mark` (n=1)",
        "[Structure] custom: unknown field `postag`, expected `no` (n=1)",
        "[Rule] regex parse error:\n    (?)ife\n     ^\nerror: repetition operator missing expression (n=1)",
        "[Rule] feature not implemented: include_skipped in `match` is not implemented. (n=1)",
        "[Rule] feature not implemented: control flow in parallel tokens is not implemented. (n=1)",
    ]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant