Undetected grammar issues. #90

madushan1000 · 2024-09-16T15:53:40Z

I'm evaluating nlprule for a browser plugin. I manged to get it running in the browser with not that much work.
But I noticed that nlprule doesn't detect quite a few errors compared to languagetool. For example, for bellow string

"A sentence with a error in the Hitchhiker's Guide tot he Galaxy.He and I is the dude."
language tool website detects 5 issues.

languagetool java library detects only 3(I guess because n-gram data is missing). One of them is spelling mistake.

Potential error at characters 16-17: Use <suggestion>an</suggestion> instead of 'a' if the following word starts with a vowel sound, e.g. 'an article', 'an hour'.
Suggested correction(s): [an]
Potential error at characters 31-41: Possible spelling mistake. 'Hitchhiker' is American English.
Suggested correction(s): [Hitch-hiker]
Potential error at characters 50-56: Did you mean <suggestion>to the</suggestion>?
Suggested correction(s): [to the]

nlprule detected only one.

TYPOS/TOT_HE/0
Did you mean to the?
[ 'to the' ]

I tried building the tokenizer/rules databases from the latest languagetool data(I couldn't figure out how to update the tokenizer model, looks like opennlp doesn't offer their new models in the same format used in nlprule). But that didn't improve things.

Can this be because of the unimplemented rules? Can you give some guidance about how to implement the missing rules? I'm talking about these

[2024-09-16T15:50:55Z INFO  nlprule::compile] Reading common words from data/en/common.txt.
[2024-09-16T15:50:55Z INFO  nlprule::compile] Creating tagger.
[2024-09-16T15:50:56Z INFO  nlprule::compile] Regex cache at data/en/regex_cache.bin is valid.
[2024-09-16T15:50:56Z INFO  nlprule::compile] data/en/chunker.json exists. Building chunker.
[2024-09-16T15:50:56Z INFO  nlprule::compile] data/en/tags/multiwords.txt exists. Building multiword tagger.
[2024-09-16T15:50:56Z INFO  nlprule::compile] Creating tokenizer.
[2024-09-16T15:50:57Z WARN  nlprule::compile::impls] Error constructing Disambiguator: [Rule] feature not implemented: postag not supported for `add`.
[2024-09-16T15:50:57Z INFO  nlprule::compile] Creating grammar rules.
[2024-09-16T15:51:00Z WARN  nlprule::compile::impls] Errors constructing Rules: [
        "[Rule] feature not implemented: postag, postag_regex, postag_replace and text in `match` are not implemented. (n=242)",
        "[Structure] custom: unknown variant `example`, expected one of `token`, `marker`, `or`, `and`, `feature` (n=212)",
        "[Rule] feature not implemented: examples with `type` (i. e. 'triggers_error') are not implemented. (n=43)",
        "[Rule] feature not implemented: rules with no suggestion are not implemented. (n=35)",
        "[Rule] feature not implemented: rules with filter are not implemented. (n=31)",
        "[Structure] custom: unknown field `tags`, expected one of `pattern`, `regexp`, `antipattern`, `message`, `suggestion`, `example`, `id`, `name`, `short`, `url`, `default`, `filter`, `__unused_unifications` (n=11)",
        "[Structure] custom: unknown field `tags`, expected one of `id`, `antipattern`, `default`, `name`, `short`, `url`, `rule` (n=10)",
        "[Structure] custom: unknown field `type`, expected one of `id`, `antipattern`, `default`, `name`, `short`, `url`, `rule` (n=8)",
        "[Structure] custom: unknown field `type`, expected one of `pattern`, `regexp`, `antipattern`, `message`, `suggestion`, `example`, `id`, `name`, `short`, `url`, `default`, `filter`, `__unused_unifications` (n=7)",
        "[Structure] custom: missing field `$value` (n=3)",
        "[Structure] custom: unknown field `raw_pos`, expected `case_sensitive` or `$value` (n=2)",
        "[Rule] regex parse error:\n    (?)id\n     ^\nerror: repetition operator missing expression (n=2)",
        "[Rule] feature not implemented: case conversion preserve not supported. (n=2)",
        "[Structure] custom: unknown field `type`, expected one of `text`, `case_sensitive`, `mark` (n=1)",
        "[Structure] custom: unknown field `postag`, expected `no` (n=1)",
        "[Rule] regex parse error:\n    (?)ife\n     ^\nerror: repetition operator missing expression (n=1)",
        "[Rule] feature not implemented: include_skipped in `match` is not implemented. (n=1)",
        "[Rule] feature not implemented: control flow in parallel tokens is not implemented. (n=1)",
    ]

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Undetected grammar issues. #90

Undetected grammar issues. #90

madushan1000 commented Sep 16, 2024

Undetected grammar issues. #90

Undetected grammar issues. #90

Comments

madushan1000 commented Sep 16, 2024