You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We would like to use these issues to gauge user interest.
The BERT tokenizer is intended as an identical reimplementation of the original BERT tokenization. However it is possible to replace the bert.tokenizer.internal.BasicTokenizer with a tokenizer using tokenizedDocument.
The belief is this should not affect the model too much as the wordpiece encoding is still the same, and it is these wordpiece encoded sub-tokens that are the input to the model.
Advantages of this are that tokenizedDocument is considerably faster than BasicTokenizer and may offer better integration with Text Analytics Toolbox functionality.
The text was updated successfully, but these errors were encountered:
We would like to use these issues to gauge user interest.
The BERT tokenizer is intended as an identical reimplementation of the original BERT tokenization. However it is possible to replace the
bert.tokenizer.internal.BasicTokenizer
with a tokenizer usingtokenizedDocument
.The belief is this should not affect the model too much as the wordpiece encoding is still the same, and it is these wordpiece encoded sub-tokens that are the input to the model.
Advantages of this are that
tokenizedDocument
is considerably faster thanBasicTokenizer
and may offer better integration with Text Analytics Toolbox functionality.The text was updated successfully, but these errors were encountered: