Skip to content

gperdrizet/llm_detector

Repository files navigation

Ask Agatha: synthetic text detection service

News

2024-08-27: Malone (now agatha) has joined the Google Cloud for Startups program! Lot's of excitement here - this success provides significant recognition and compute resources to the project. For now, the only visible change will be a rename of the project to 'Ask Agatha', with the model being colloquially referred to as 'agatha'. The LLM detector is still avalible on telegram via @ask_agatha_bot. Please direct any inquiries to gperdrizet@ask-agatha.com.

2024-08-17: Malone is temporarily off-line so that compute resources can be dedicated to benchmarking and improvements to the classifier. Check out what is going on in the benchmarking and classifier notebooks on the classifier branch. If you would really like to try malone out, get in touch and I will fire it up for you.

2024-08-07: Malone was just named a Backdrop Build v5 Finalist! Check out the build page here! Let's gooooo!

2024-08-01: Backdrop build v5 launch video is up on YouTube. Congrats to all of the other Backdrop Build finishers!

2024-07-30: Malone is live in Beta on Telegram, give it a try here. Note: some Firefox users have reported issues with the botlink page - seems to be a Telegram issue, not a malone issue. You can also find malone by messaging '/start' to @the_malone_bot anywhere you use Telegram.

2024-07-08: llm_detector is officially part of the Backdrop Build v5 cohort under the tentative name 'malone' starting today. Check out the backdrop build page for updates.

Project description

agatha

Agatha is a synthetic text detection service available on Telegram Messenger, written in Python using HuggingFace, scikit-learn, XGBoost, Luigi and python-telegram-bot, supported by Flask, Celery, Redis & Docker and served via Gunicorn and Nginx. Malone uses an in-house trained gradient boosting classifier to estimate the probability that a given text was generated by an LLM. It uses a set of engineered features derived from the input text, for more details see the feature engineering notebooks.

Table of Contents

  1. Features
  2. Where to find agatha
  3. Usage
  4. Performance
  5. Demonstration/experimentation notebooks
  6. About the author
  7. Disclaimer

1. Features

  • Easily accessible - use it anywhere you can access Telegram: iOS or Android apps and any web browser.
  • Simple interface - no frills, just send the bot text and it will send back the probability that the text was machine generated.
  • Useful and accurate - provides a probability that text is synthetic, allowing users to make their own decisions when evaluating content. Maximum likelihood classification accuracy ~98% on held-out test data.
  • Model agnostic - agatha is not trained to detect the output of a specific LLM, instead, it uses a gradient boosting classifier and a set of numerical features derived from/calibrated on a large corpus of human and synthetic text samples from multiple LLMs.
  • No logs - no user data or message contents are ever persisted to disk.
  • Open source codebase - agatha is an open source project. Clone it, fork it, extend it, modify it, host it yourself and use it the way you want to use it.
  • Free

2. Where to find agatha

Agatha is publicly available on Telegram. You can find agatha via the Telegram bot page, or just message @ask_agatha_bot with '/start' to start using it.

There are also plans in the works to offer the bare API to interested parties. If that's you, see section 6 below.

3. Usage

To use agatha you will need a Telegram account. Telegram is free to use and available as an app for iOS and Android. There is also a web version for desktop use.

Once you have a Telegram account, agatha is simple to use. Send the bot any 'suspect' text and it will reply with the probability that the text in question was written by a human or generated by an LLM. For smartphone use, a good trick is long press on 'suspect' text and then share it to agatha's contact on Telegram via the context menu. Agatha is never more that 2 taps away!

telegram app screenshot

Agatha can run in two response modes: 'default' and 'verbose'. Default mode returns the probability associated with the most likely class as a percent (e.g. 75% chance a human wrote this). Verbose mode gives a little more detail about the feature values and prediction metrics. Set the mode by messaging '/set_mode verbose' or '/set_mode default'.

For best results, submitted text must be between 50 and 500 words.

4. Performance

Agatha is >~97.5% accurate on hold-out test data depending on the submitted text length. (see example confusion matrix below). Classification accuracy is lowest on short text and best on text >= 150 words. The miss-classified examples are more or less evenly split between false negatives and false positives.

XGBoost confusion matrix

For more details on the classifier training and performance see the following notebooks:

  1. Stage I length binned classifier
  2. Stage II length binned classifier
  3. v2.0 classifier finalized

5. Demonstration/experimentation notebooks

These notebooks are the best way to understand the approach and the engineered features used to train the classifier.

  1. Perplexity ratio data
  2. Perplexity ratio score
  3. TF-IDF score

6. About the author

My name is Dr. George Perdrizet, I am a biochemistry & molecular biology PhD seeking a career step from academia to professional data science and/or machine learning engineering. This project was conceived from the scientific literature and built solo over the course of a few weeks - I strongly believe that I have a lot to offer the right organization. If you or anyone you know is interested in an ex-researcher from University of Chicago turned builder and data scientist, please reach out, I'd love to learn from and contribute to your project.

7. Disclaimer

Agatha is an experimental research project meant for educational, informational and entertainment purposes only. All predictions are probabilistic in nature and subject to stochastic errors. Text classifications, no matter how high or low the reported probability, should not be interpreted as definitive proof of authorship or lack thereof.