Tokenizer special cases do not work around infix punctuation #5598
Labels
enhancement
Feature requests and improvements
feat / tokenizer
Feature: Tokenizer
lang / en
English language data and models
How to reproduce the behaviour
I would expect the two sentences below to be tokenized the same way. However, in the second, the special cases for "won't" and "can't" do not work.
Your Environment
The text was updated successfully, but these errors were encountered: