$ pip install .
$ cat raw-sentences.txt | tokens-per-word -t "north/t5_base_scand3M"
# or (if you have csvkit/csvtools)
$ csvcut -c "text-column" tabular-dataset.csv | tokens-per-word -t "north/t5_base_scand3M"
# or (if you jave jq)
$ cat line-formatted-dataset.jsonl | jq .text-field | tokens-per-word -t "north/t5_base_scand3M"
-
Notifications
You must be signed in to change notification settings - Fork 0
sorenmulli/tokens-per-word
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
How many tokens per word using given tokenizer and given dataset?
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published