This project composes of two components. The first one is a set of basic tools which have been developed as part of Kurdish BLARK project (see https://www.researchgate.net/profile/Hossein_Hassani11). The second is corpora of Kurmanji and Sorani dialects of Kurdish. The tools have been developed in Python (2.7). The tools currently include: a transliterator that transliterates Persian/Arabic texts into Latin script, a tokenizer which tokenizes the texts and uses RE to remove special characters and numeral tokens, a stemmer to find Kurmanji and Sorani stems, a word level literal translator based on a bidialectal dictionary to perform a literal translation from Kurmanji to Sorani and vice versa, a Kurdish proper names recognizer, and several other tools for building dictionaries and keeping them sorted. The codes include comments which help in understanding the logics.
forked from hosseinhassani/Kurdish-BLARK
-
Notifications
You must be signed in to change notification settings - Fork 0
License
KurdishBLARK/Kurdish-BLARK-Basic
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published
Languages
- Python 98.0%
- MATLAB 2.0%