Skip to content

PyTorch deep learning model to detect toxic of Vietnamese sentences using Bert

License

Notifications You must be signed in to change notification settings

hoangcaobao/Vietnamese-Toxic-Comment-Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VietnameseTextToxicClassify

I trained this model using PyTorch to detect toxic comments for Projectube.

I used VNCoreNLP to preprocess the raw Vietnamese sentence data and PhoBERT to train the model for text classification. I applied these technologies from https://github.com/VinAIResearch/PhoBERT.

Use my code

1. Git clone my repository:

git clone https://github.com/hoangcaobao/Vietnamese_Text_Toxic_Classify.git

2. Change the directory to my folder and install VNCoreNLP:

cd VietnameseTextToxicClassify
pip3 install vncorenlp
mkdir -p vncorenlp/models/wordsegmenter
wget https://raw.githubusercontent.com/vncorenlp/VnCoreNLP/master/VnCoreNLP-1.1.1.jar
wget https://raw.githubusercontent.com/vncorenlp/VnCoreNLP/master/models/wordsegmenter/vi-vocab
wget https://raw.githubusercontent.com/vncorenlp/VnCoreNLP/master/models/wordsegmenter/wordsegmenter.rdr
mv VnCoreNLP-1.1.1.jar vncorenlp/ 
mv vi-vocab vncorenlp/models/wordsegmenter/
mv wordsegmenter.rdr vncorenlp/models/wordsegmenter/

3. Add more data in 2 JSON files

4. Run training file:

python3 training.py

Bao Hoang

About

PyTorch deep learning model to detect toxic of Vietnamese sentences using Bert

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages