-
-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Latvian language support? #682
Comments
Hi! Is there a respected academy, university, or institute that regulates the language? In many countries such academical bodies issue the "big dictionary of X". If you have such big dictionary of Latvian in a downloadable format, it would be great to share it here. Such dictionaries are spell-checked and contain many different word forms, which results in very good word predictions. I have already developed strategy for Latin- and Cyrillic-based languages, so my only problem is finding a good dictionary. Since I don't speak so many foreign languages, I can't search in foreign websites. I really need a hand with this. The rest of the technical stuff, I'll take care about it, don't worry. |
@sspanak I'm using a source from the language department of the local university. It contains a literary language dictionary, modern language dictionary and a general dictionary. Also found this one for spell checking. I have these files, but I'm not sure how and where to get the utf8.csv dictionary file from(assuming most people don't write thousands of table cells by hand) |
The dictionaries link to here. I guess I can download and extract all words from that website. I'll check it out when I have more free time. As for wooorm's dictionaries, initially I was also optimistic about them, but with time I've started to notice they contain a lot of misspelled words or words from different languages, despite the fact they are meant to be used for spell checking. I'd rather not use them or use small sets of data only. Anyway, thanks for sharing |
@sspanak I think tezaurs.lv is the main legitimate one available in our country. I have found some others, but they either require a payment or clearly state that the language data has been gathered from media(subtitles). The link you shared has an option to email them to request a PostgreSQL database dump instead of the available TEI/XML and LMF/XML formats. If that makes things easier I could message them to get it for you? |
XML format should be fine. I'll let you know if I need anything else. |
Hi! Is there a way that I could help to add Latvian language support to T9? I see that there already is Lithuanian, but, unfortunately, our languages are quite different, so I can't really use that. What files are required get these changes?
The text was updated successfully, but these errors were encountered: