Bangla Word Clustering Based on Tri-gram, 4-gram and 5-gram Language Model

Java | project page | paper

by Hossain Md Saddam, Dipaloke Saha, Sabir Ismail and MD. Saiful Islam

Abstract

In this paper, we describe a research method that generates Bangla word clusters on the basis of relating to meaning in language and contextual similarity. The importance of word clustering is in parts of speech (POS) tagging, word sense disambiguation, text classification, recommender system, spell checker, grammar checker, knowledge discover and for many others Natural Language Processing (NLP) applications. In the history of word clustering, English and some other languages have already implemented some methods on word clustering efficiently. But due to lack of the resources, word clustering in Bangla has not been still implemented efficiently. Presently, its implementation is in the beginning stage. In some research of word clustering in English based on preceding and next five words of a key word they found an efficient result. Now, we are trying to implement the tri-gram, 4-gram and 5-gram model of word clustering for Bangla to observe which one is the best among them. We have started our research with quite a large corpus of approximate 1 lakh Bangla words. We are using a machine learning technique in this research. We will generate word clusters and analyze the clusters by testing some different threshold values.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bangla Word Clustering Based on Tri-gram, 4-gram and 5-gram Language Model

Java | project page | paper

Abstract

Datasets Preparation

Implementation

Related Works

Acknowledgments

About

Releases

Packages

License

Ibn-Ahmad68/Bangla-Word-Clustering

Folders and files

Latest commit

History

Repository files navigation

Bangla Word Clustering Based on Tri-gram, 4-gram and 5-gram Language Model

Java | project page | paper

Abstract

Datasets Preparation

Implementation

Related Works

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages