You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are several tokenization conventions (e.g. the token used for padding, separating segments, etc.) that need to be specified when doing the wordpiece tokenization for BERT. Currently, some of these conventions are hard-coded in, while others are function parameters. We should decide on a consistent approach here.
There are several tokenization conventions (e.g. the token used for padding, separating segments, etc.) that need to be specified when doing the wordpiece tokenization for BERT. Currently, some of these conventions are hard-coded in, while others are function parameters. We should decide on a consistent approach here.
Also, more clearly delineate what belongs in RBERT vs. wordpiece.
(macmillancontentscience/wordpiece#15)
The text was updated successfully, but these errors were encountered: