You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm evaluating nlprule for a browser plugin. I manged to get it running in the browser with not that much work.
But I noticed that nlprule doesn't detect quite a few errors compared to languagetool. For example, for bellow string
"A sentence with a error in the Hitchhiker's Guide tot he Galaxy.He and I is the dude."
language tool website detects 5 issues.
languagetool java library detects only 3(I guess because n-gram data is missing). One of them is spelling mistake.
Potential error at characters 16-17: Use <suggestion>an</suggestion> instead of 'a' if the following word starts with a vowel sound, e.g. 'an article', 'an hour'.
Suggested correction(s): [an]
Potential error at characters 31-41: Possible spelling mistake. 'Hitchhiker' is American English.
Suggested correction(s): [Hitch-hiker]
Potential error at characters 50-56: Did you mean <suggestion>to the</suggestion>?
Suggested correction(s): [to the]
nlprule detected only one.
TYPOS/TOT_HE/0
Did you mean to the?
[ 'to the' ]
I tried building the tokenizer/rules databases from the latest languagetool data(I couldn't figure out how to update the tokenizer model, looks like opennlp doesn't offer their new models in the same format used in nlprule). But that didn't improve things.
Can this be because of the unimplemented rules? Can you give some guidance about how to implement the missing rules? I'm talking about these
[2024-09-16T15:50:55Z INFO nlprule::compile] Reading common words from data/en/common.txt.
[2024-09-16T15:50:55Z INFO nlprule::compile] Creating tagger.
[2024-09-16T15:50:56Z INFO nlprule::compile] Regex cache at data/en/regex_cache.bin is valid.
[2024-09-16T15:50:56Z INFO nlprule::compile] data/en/chunker.json exists. Building chunker.
[2024-09-16T15:50:56Z INFO nlprule::compile] data/en/tags/multiwords.txt exists. Building multiword tagger.
[2024-09-16T15:50:56Z INFO nlprule::compile] Creating tokenizer.
[2024-09-16T15:50:57Z WARN nlprule::compile::impls] Error constructing Disambiguator: [Rule] feature not implemented: postag not supported for `add`.
[2024-09-16T15:50:57Z INFO nlprule::compile] Creating grammar rules.
[2024-09-16T15:51:00Z WARN nlprule::compile::impls] Errors constructing Rules: [
"[Rule] feature not implemented: postag, postag_regex, postag_replace and text in `match` are not implemented. (n=242)",
"[Structure] custom: unknown variant `example`, expected one of `token`, `marker`, `or`, `and`, `feature` (n=212)",
"[Rule] feature not implemented: examples with `type` (i. e. 'triggers_error') are not implemented. (n=43)",
"[Rule] feature not implemented: rules with no suggestion are not implemented. (n=35)",
"[Rule] feature not implemented: rules with filter are not implemented. (n=31)",
"[Structure] custom: unknown field `tags`, expected one of `pattern`, `regexp`, `antipattern`, `message`, `suggestion`, `example`, `id`, `name`, `short`, `url`, `default`, `filter`, `__unused_unifications` (n=11)",
"[Structure] custom: unknown field `tags`, expected one of `id`, `antipattern`, `default`, `name`, `short`, `url`, `rule` (n=10)",
"[Structure] custom: unknown field `type`, expected one of `id`, `antipattern`, `default`, `name`, `short`, `url`, `rule` (n=8)",
"[Structure] custom: unknown field `type`, expected one of `pattern`, `regexp`, `antipattern`, `message`, `suggestion`, `example`, `id`, `name`, `short`, `url`, `default`, `filter`, `__unused_unifications` (n=7)",
"[Structure] custom: missing field `$value` (n=3)",
"[Structure] custom: unknown field `raw_pos`, expected `case_sensitive` or `$value` (n=2)",
"[Rule] regex parse error:\n (?)id\n ^\nerror: repetition operator missing expression (n=2)",
"[Rule] feature not implemented: case conversion preserve not supported. (n=2)",
"[Structure] custom: unknown field `type`, expected one of `text`, `case_sensitive`, `mark` (n=1)",
"[Structure] custom: unknown field `postag`, expected `no` (n=1)",
"[Rule] regex parse error:\n (?)ife\n ^\nerror: repetition operator missing expression (n=1)",
"[Rule] feature not implemented: include_skipped in `match` is not implemented. (n=1)",
"[Rule] feature not implemented: control flow in parallel tokens is not implemented. (n=1)",
]
The text was updated successfully, but these errors were encountered:
I'm evaluating nlprule for a browser plugin. I manged to get it running in the browser with not that much work.
But I noticed that nlprule doesn't detect quite a few errors compared to languagetool. For example, for bellow string
languagetool java library detects only 3(I guess because n-gram data is missing). One of them is spelling mistake.
nlprule detected only one.
I tried building the tokenizer/rules databases from the latest languagetool data(I couldn't figure out how to update the tokenizer model, looks like opennlp doesn't offer their new models in the same format used in nlprule). But that didn't improve things.
Can this be because of the unimplemented rules? Can you give some guidance about how to implement the missing rules? I'm talking about these
The text was updated successfully, but these errors were encountered: