AI Autocomplete

This is a simple implementation of an autocorrect engine in Python. It is made with Tensorflow and Keras. It is meant to autcorrect or "autocomplete" a word with only given a segment of it.
Here is an example of how it works:

input: hel
output: hello

How it works

When creating this, I made the dataset by hand. I didn't want this to be a fully functional thing but rather a demo. The dataset was created in a CSV format. It has a mistyped/partially typed word on the X or input side, and then the Y or output side has the completed or corrected word. It will preprocess it with the custom preprocessor I made which split the word by character and tokenizes it. I did this because with traditional NLP tokenization methods such as by word would not have worked as efficiently. This is because character by character tokenization helps the Network learn more complex and patterns. This will also improve the accuracy overall. Also, mispelled words can be unique so tokenization by character would help solve the problem of the user entering something not in the vocabulary list. For the Y axis or output, I took all the unique labels and assigned them each a number. Next I would make a Sequential model. It will have the obvious Embeddings layer. Then we will add two LSTM layers one being bidirectional. This will help the network recognize patters on the front and back of the data. It will help it get context on inputs in simple terms. Then the last layer is a Fully connected Dense layer with the softmax activation. What that would do is classify the textual input as one of the labeled words. We will train it wiith the Adam optimizer and Sparse_categorical_crossentropy. This is commonly used for multiclass classification. When I run the model, it will take a misspelled word input from the user. It will then proceed to classify it. The classification is returned in an index where each index is one lablem and the label contains the probability of the prediction being said class. So find the label, we will get the index of the highest percentage in the array. That index will be converted to a label. The run function would proceed to return the label along with the confidence of it being said label.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
utils		utils
README.md		README.md
data.csv		data.csv
model.keras		model.keras
run.py		run.py
training.py		training.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Autocomplete

How it works

About

Releases

Packages

Languages

HPD1155/word-autocorrect

Folders and files

Latest commit

History

Repository files navigation

AI Autocomplete

How it works

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages