Deconflicting Doc.Ents #13331
Closed
calebsmo
started this conversation in
Help: Best practices
Replies: 1 comment
-
No longer relevant. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Really hope this question makes sense, and I apologize in advance if it doesn't. I'll try to keep this brief.
I am using Spacy version 2.2.4. We have a custom adjust_spans.py that is being added to our pipeline like so,
Our model breaks paragraphs into sentences, then runs the nlp/entity detection at the sentence level. Our adjust_spans is called multiple times for each sentence and continually adds the same entities into the doc.ents. Inevitably, we have an issue where is detects the same entity twice and throws an error, like this:
ValueError: [E103] Trying to set conflicting doc.ents: '(7, 8, 'SELECTIVE_SERVICE')' and '(7, 8, 'SELECTIVE_SERVICE')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap.
The model had previously added this same entity into the doc.ents, but because it continually runs over the same sentence, it creates this problem.
How can I prevent the model from iterating over the same entities once they have been added?
Beta Was this translation helpful? Give feedback.
All reactions