Version 1.0
Author: Tzoop Ilya (ilyatzoop@gmail.com)
This application is a console version of a SpellCorrector
app, which supports all languages of the world.
As a dictionaries for testing there are Russian and English dictionaries available.
You can also create your own dictionary from a text file for any language through dictionary_creator.py
- Python >= 3.6
- Console SpellCorrector:
spellcorrector.py
- Console Dictionary Creator:
dictionary_creator.py
- Console Dictionary Downloader:
dictionary_downloader.py
- Modules:
SpellCorrector/
- Tests:
Tests/
- Used dictionaries:
Correct Dictionaries
Has 2 modes:
- Mispellings
Prints all mistakes in the following format:
line:index
{'word': 'WORD', 'correction': ['CORR1', 'CORR2']
}
- [-c], [--coordinate] - Adds coordinate
line:index
to the output - [--correct] amount - Corrects
amount
of mistakes and prints them
- Mistake Finder
Prints all mistakes in the following format without coordinates and corrections
Example:
./spellcorrector.py --infile Texts\HarryPotterText.txt mistake_finder 10
More detailed information is available with--help
flag.
Allows you to work with dictionaries. Has 4 modes:
- Add
Adds a word to the chosen dictionary. (You can't just open the dictionary and append a word. A special hash
salt
is used to prevent corruption and keep words in the desired format.) - Append Appends second dictionary to the first
- Merge Merges two dictionaries and stores them as a separate dictionary
- Create
Creates a dictionary from a provided text file.
Example:
./dictionary_creator.py add Zeliboba "Correct Dictionaries/large.dic"
There is also--help
available.
Downloads Russian and English dictionaries and stores them in the Correct Dictionaries
folder.
Launch: ./dictionary_downloader.py
Supports line breaks, skipped spaces, mispellings.
Based on Levenshtein distance metric.
Module, calculating Levenshtein distance, is located here: SpellCorrector/levenshtein_distance_counter.py
This module is based on a Trie for fast distance calculation. There is also an heuristic optimization for the same purpose.