If training a customer NER model, is there a meaningful difference between using SpanCategorizer vs. "regular" NER? #9681

Lolologist · 2021-11-16T18:34:00Z

Lolologist
Nov 16, 2021

I want to train one or more custom entities, and also get scores for those entities. I know that a/the current way to do that is by using SpanCategorizer, and that SpanCategorizer's suggesters (https://spacy.io/api/spancategorizer#suggesters) are different from however NER finds its candidate spans. Also, the documentation seems to at least heavily hint that SpanCategorizer is intended for arbitrary spans that aren't really entities per se.

Is there anything fundamentally different in how they work that would make it a bad idea/needing careful attention to train traditional entities via SpanCategorizer?

polm · 2021-11-17T03:52:28Z

polm
Nov 17, 2021

See this comment. Basically the SpanCat is more general / gets to make fewer assumptions, so it can be less accurate. Depending on application the difference in accuracy with an NER component may be negligible or it may be more significant.

2 replies

Lolologist Nov 17, 2021
Author

Thanks very much for the reply! Semi-unrelated question; if I write a different span suggester (probably one based on dependency parses in case you're looking for "phrases") is that something that you all would be interested in incorporating?

svlandeg Nov 17, 2021
Maintainer

Hi @Lolologist, I don't think we want to have a bunch of different suggesters in the library, as often people will still want to be able to tweak some things or implement their own. But if you have some useful implementations that you feel are generic enough, it might make sense to publish them in the spaCy universe?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

If training a customer NER model, is there a meaningful difference between using SpanCategorizer vs. "regular" NER? #9681

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

If training a customer NER model, is there a meaningful difference between using SpanCategorizer vs. "regular" NER? #9681

Lolologist Nov 16, 2021

Replies: 1 comment · 2 replies

polm Nov 17, 2021

Lolologist Nov 17, 2021 Author

svlandeg Nov 17, 2021 Maintainer

Lolologist
Nov 16, 2021

Replies: 1 comment 2 replies

polm
Nov 17, 2021

Lolologist Nov 17, 2021
Author

svlandeg Nov 17, 2021
Maintainer