You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi Giovani,
In the RoBERTa and GPT-2 tokenizer, the space before a word is always part
of the subword. The special token *Ġ *is used to mark a space. Take into
account that a word may be splitted into two or more subwords, and this
special token is also used to distinguish between full words and subwords.
For instance, creatinina can be divided in '*Ġ*creat' and 'inina' (note
that 'inina' does not start with the special token.
Best,
Aitor.
On Sun, Aug 7, 2022 at 8:51 PM Giovani Bettoni ***@***.***> wrote:
Why there is a tag Ġ in the output?
Expected output with the predicted entities:
[
{'word': '*Ġ*calcio', 'score': 0.9963880181312561, 'entity':
'B-NORMALIZABLES', 'index': 24, 'start': 137, 'end': 143}, {'word': '*Ġ*calcio',
'score': 0.9965023398399353, 'entity': 'B-NORMALIZABLES', 'index': 29,
'start': 163, 'end': 169}, {'word': '*Ġ*magnesio', 'score':
0.996299147605896, 'entity': 'B-NORMALIZABLES', 'index': 32, 'start': 178,
'end': 186}, {'word': '*Ġ*PTH', 'score': 0.9950509667396545, 'entity':
'B-PROTEINAS', 'index': 34, 'start': 189, 'end': 192}
]
—
Reply to this email directly, view it on GitHub
<#2>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC5U2DMA4CWPVR6NCDMB6GLVYAASZANCNFSM5524TODQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
Why there is a tag Ġ in the output?
Expected output with the predicted entities:
[
{'word': 'Ġcalcio', 'score': 0.9963880181312561, 'entity': 'B-NORMALIZABLES', 'index': 24, 'start': 137, 'end': 143}, {'word': 'Ġcalcio', 'score': 0.9965023398399353, 'entity': 'B-NORMALIZABLES', 'index': 29, 'start': 163, 'end': 169}, {'word': 'Ġmagnesio', 'score': 0.996299147605896, 'entity': 'B-NORMALIZABLES', 'index': 32, 'start': 178, 'end': 186}, {'word': 'ĠPTH', 'score': 0.9950509667396545, 'entity': 'B-PROTEINAS', 'index': 34, 'start': 189, 'end': 192}
]
The text was updated successfully, but these errors were encountered: