Team 5 - BERT Brigade - Binary Machine-Generated Text Detection

By Dhyey Nilesh Doshi (ID: 40244534)

Abstract

This project implements a Naive Bayes classifier and introduces a RoBERTa - based and DistilBERT based model in order to compare them and accurately differentiate between human-written and AI-generated text. By using transfer learning with the RoBERTa-base architecture and fine-tuning on a diverse dataset, it achieves a robust 75.5% test accuracy in detecting machine-generated content. In case of DistilBERT, it achieves a slightly more 78% test accuracy. If stopwords are not removed, it increases the RoBERTa accuracy to 80.7%. Whereas, using a probabilistic classification approach that is - Naive Bayes achieves an accuracy of 68%. This solution offers a tool for verifying content and mitigating the risks associated with AI-generated content.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
Notebook-0 NaiveBayes-BOW.ipynb		Notebook-0 NaiveBayes-BOW.ipynb
Notebook-3 (RoBERTa).ipynb		Notebook-3 (RoBERTa).ipynb
Notebook-4 (DistilBERT).ipynb		Notebook-4 (DistilBERT).ipynb
Notebook-5 (RoBERTa with stopwords included).ipynb		Notebook-5 (RoBERTa with stopwords included).ipynb
PR_ROC_plot.ipynb		PR_ROC_plot.ipynb
Project_Report.pdf		Project_Report.pdf
README.md		README.md
Team5-Poster.pdf		Team5-Poster.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Team 5 - BERT Brigade - Binary Machine-Generated Text Detection

Abstract

About

Releases

Packages

Languages

License

dhy3y/HumanVsAI-Text-Detection

Folders and files

Latest commit

History

Repository files navigation

Team 5 - BERT Brigade - Binary Machine-Generated Text Detection

Abstract

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages