Skip to content

stanfordnlp/sindhi-tokenization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

sindhi-tokenization

Sindhi tokenization data from ISRA

A collection of text files, with token and sentence boundaries marked in the tkns_ and stns_ files respectively.

A tool in Stanza, convert_text_files.py, processes this data into a CoNLL-style suitable for training a tokenizer. (The other annotations are left blank.)

About

Sindhi tokenization data from ISRA

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published