NER Example #2

giovaninb · 2022-08-07T18:51:13Z

Why there is a tag Ġ in the output?

Expected output with the predicted entities:

[
{'word': 'Ġcalcio', 'score': 0.9963880181312561, 'entity': 'B-NORMALIZABLES', 'index': 24, 'start': 137, 'end': 143}, {'word': 'Ġcalcio', 'score': 0.9965023398399353, 'entity': 'B-NORMALIZABLES', 'index': 29, 'start': 163, 'end': 169}, {'word': 'Ġmagnesio', 'score': 0.996299147605896, 'entity': 'B-NORMALIZABLES', 'index': 32, 'start': 178, 'end': 186}, {'word': 'ĠPTH', 'score': 0.9950509667396545, 'entity': 'B-PROTEINAS', 'index': 34, 'start': 189, 'end': 192}
]

gonzalez-agirre · 2022-08-08T08:36:57Z

Hi Giovani, In the RoBERTa and GPT-2 tokenizer, the space before a word is always part of the subword. The special token *Ġ *is used to mark a space. Take into account that a word may be splitted into two or more subwords, and this special token is also used to distinguish between full words and subwords. For instance, creatinina can be divided in '*Ġ*creat' and 'inina' (note that 'inina' does not start with the special token. Best, Aitor.

…

On Sun, Aug 7, 2022 at 8:51 PM Giovani Bettoni ***@***.***> wrote: Why there is a tag Ġ in the output? Expected output with the predicted entities: [ {'word': '*Ġ*calcio', 'score': 0.9963880181312561, 'entity': 'B-NORMALIZABLES', 'index': 24, 'start': 137, 'end': 143}, {'word': '*Ġ*calcio', 'score': 0.9965023398399353, 'entity': 'B-NORMALIZABLES', 'index': 29, 'start': 163, 'end': 169}, {'word': '*Ġ*magnesio', 'score': 0.996299147605896, 'entity': 'B-NORMALIZABLES', 'index': 32, 'start': 178, 'end': 186}, {'word': '*Ġ*PTH', 'score': 0.9950509667396545, 'entity': 'B-PROTEINAS', 'index': 34, 'start': 189, 'end': 192} ] — Reply to this email directly, view it on GitHub <#2>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC5U2DMA4CWPVR6NCDMB6GLVYAASZANCNFSM5524TODQ> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NER Example #2

NER Example #2

giovaninb commented Aug 7, 2022

gonzalez-agirre commented Aug 8, 2022 via email

NER Example #2

NER Example #2

Comments

giovaninb commented Aug 7, 2022

Expected output with the predicted entities:

gonzalez-agirre commented Aug 8, 2022 via email