Skip to content

KurdishBLARK/Kurdish-BLARK-Basic

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kurdish-BLARK

This project composes of two components. The first one is a set of basic tools which have been developed as part of Kurdish BLARK project (see https://www.researchgate.net/profile/Hossein_Hassani11). The second is corpora of Kurmanji and Sorani dialects of Kurdish. The tools have been developed in Python (2.7). The tools currently include: a transliterator that transliterates Persian/Arabic texts into Latin script, a tokenizer which tokenizes the texts and uses RE to remove special characters and numeral tokens, a stemmer to find Kurmanji and Sorani stems, a word level literal translator based on a bidialectal dictionary to perform a literal translation from Kurmanji to Sorani and vice versa, a Kurdish proper names recognizer, and several other tools for building dictionaries and keeping them sorted. The codes include comments which help in understanding the logics.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.0%
  • MATLAB 2.0%