By Dhyey Nilesh Doshi (ID: 40244534)
This project implements a Naive Bayes classifier and introduces a RoBERTa - based and DistilBERT based model in order to compare them and accurately differentiate between human-written and AI-generated text. By using transfer learning with the RoBERTa-base architecture and fine-tuning on a diverse dataset, it achieves a robust 75.5% test accuracy in detecting machine-generated content. In case of DistilBERT, it achieves a slightly more 78% test accuracy. If stopwords are not removed, it increases the RoBERTa accuracy to 80.7%. Whereas, using a probabilistic classification approach that is - Naive Bayes achieves an accuracy of 68%. This solution offers a tool for verifying content and mitigating the risks associated with AI-generated content.