Skip to content

dhy3y/HumanVsAI-Text-Detection

Repository files navigation

Team 5 - BERT Brigade - Binary Machine-Generated Text Detection

By Dhyey Nilesh Doshi (ID: 40244534)

Abstract

This project implements a Naive Bayes classifier and introduces a RoBERTa - based and DistilBERT based model in order to compare them and accurately differentiate between human-written and AI-generated text. By using transfer learning with the RoBERTa-base architecture and fine-tuning on a diverse dataset, it achieves a robust 75.5% test accuracy in detecting machine-generated content. In case of DistilBERT, it achieves a slightly more 78% test accuracy. If stopwords are not removed, it increases the RoBERTa accuracy to 80.7%. Whereas, using a probabilistic classification approach that is - Naive Bayes achieves an accuracy of 68%. This solution offers a tool for verifying content and mitigating the risks associated with AI-generated content.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published