Artificial intelligence (AI) has become a popular technology that is now widely used by the public. That is due to the emergence of intelligent chatbots from OpenAI, namely ChatGPT. Various groups of people use ChatGPT for different purposes, one of them is students who use ChatGPT to understand material, do assignments, compose essays and paraphrase journals. Paraphrasing on ChatGPT and using paraphrased text as writing in papers can be considered a form of plagiarism. The problem is, to find out whether the text was AI-generated or human-written text, it takes a very long and in-depth understanding of the patterns and arrangement of words in the text. Therefore, we need a system that is able to detect text generated by AI or not. This text detection system uses a deep learning approach. Human text data is collected from the Detik news portal and the Quora question-and-answer website. AI text data is generated through a paraphrasing process on human text data. The vectorization in this research uses Doc2Vec and BERT Tokenizer. The models used in this study were LSTM, GRU, Bi-LSTM, Bi-GRU and BERT with the IndoBERT pre-trained model. Of the five models, the best accuracy on training data is BERT, while the best accuracy in evaluation with data validation is Bi-LSTM and Bi-GRU.
- Modeling: Numpy, Pandas, Scikit-learn, Gensim, Tensorflow, PyTorch, Hugging Face
- Web Application: Flask, JQuery, Tailwind CSS
These instructions will guide you through installing the project on your local machine for testing purposes. (Note: This project contains large file storage, so please be patient as processing may take several minutes)
This project requires Python 3.10.5.
Clone this repository
git clone https://github.com/kevin-wijaya/AI-Generated-Text-Detection-with-Deep-Learning-Approach-on-Indonesian-Text.git
Rename the folder and change directory into it
mv AI-Generated-Text-Detection-with-Deep-Learning-Approach-on-Indonesian-Text ai-text-detection && cd ai-text-detection
Initialize the python environment to ensure isolation
python -m venv .venv
Install prerequisite python packages
python run.py pip install -r requirements.txt
Install the necessary LFS model
gdown --folder 19fi_oNv42G5n27bO-W1f03PDPYckjgPX -O ./models/ && git lfs install && git clone https://huggingface.co/indolem/indobert-base-uncased ./models/indolem/indobert-base-uncased
Run the python app.py using run.py and enjoy 😁
python run.py app
To use this web application is easy, follow these 3 steps:
- Insert Text: Enter your text into the textarea provided.
- Detect: Click on the "Detect" button to process the text and obtain results.
- Change Models: Optionally, you can select different models from the options available to see varied results.
Below is a table showing the evaluation metrics from the experiments conducted:
Model | Precision (%) | Recall (%) | F1-Score (%) | Accuracy (%) |
---|---|---|---|---|
LSTM | 75 | 75 | 75 | 75 |
GRU | 71 | 72 | 71 | 71 |
Bi-LSTM | 77 | 77 | 77 | 77 |
Bi-GRU | 77 | 77 | 77 | 77 |
IndoBERT | 71 | 71 | 71 | 71 |
Model | Label | Precision (%) | Recall (%) | F1-Score (%) |
---|---|---|---|---|
LSTM | Human | 72 | 77 | 75 |
AI | 78 | 74 | 76 | |
GRU | Human | 62 | 77 | 68 |
AI | 81 | 68 | 77 | |
Bi-LSTM | Human | 73 | 79 | 76 |
AI | 80 | 75 | 78 | |
Bi-GRU | Human | 72 | 80 | 76 |
AI | 82 | 74 | 78 | |
IndoBERT | Human | 67 | 73 | 70 |
AI | 75 | 69 | 72 |
Here are some screenshots of the application:
- Kevin Wijaya