-
Notifications
You must be signed in to change notification settings - Fork 1
Coder1400/LanguageIdentification
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Language Modeling: letter & word bi-grams for language identification. ========================== SETUP INSTRUCTIONS =============================== 1.) clone this repository https://github.com/Arken94/LanguageIdentification.git from 2.) Within the newly cloned repository on your local machine, there should be 3 python files: “letterLangId.py”, “wordLangId.py”, and “wordLandId2.py” 3.) letterLangId.py is the letter bigram implementation. wordLangId.py is the word bigram implementation. wordLangId2.py is the word bigram implementation with an advanced smoothing technique (extra credit). 4.) To run any of these python files simply run the file using the python command, for example: “python wordLangId.py“ the code will open the proper training and test data files (hardcoded, no arguments to the program are needed) and implement the language model for that specific implementation. 5.) NOTE: when you run any of the python files MAKE SURE that each of the training files and the test files exist in the same directory that you are running the python file from. I have included them in the github repository, so they should already be there. 6.) the output of each program is printed to an output file with the same name as the python file, except with a “.out” extension. For example, wordLangId.py will print its output to wordLangId.out. These files should already contain the output of each program.
About
Identify between English, French, and Italian with 99% accuracy. Uses language modeling techniques including LaPlace and Good-Turing smoothing.
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published