Classifying Human and Machine Generated Text

Norrec Nieh, Frank Rossi, & Jason Zhang Khoury College of Computer Sciences Northeastern University Boston, MA 02115

nieh.c@northeastern.edu, rossi.f@northeastern.edu, zhang.haozhe1@northeastern.edu

Abstract

We investigated the ability of perplexity to classify texts as human or machine generated using two approaches, a single perplexity score and a sequence of word probabilities corresponding to the input text. The former was classified according to a single threshold and the latter was fed into a neural network. These perplexities and probabilities were generated using N-grams. Our best result for the single score, threshold classifier was 77% and our best result for the probability sequence, ANN classifier was 80%. Our work demonstrates that perplexity can be used as a feature to distinguish human and machine texts even with basic classifiers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Files

README.md

Latest commit

History

README.md

File metadata and controls