MAGE: Machine-generated Text Detection in the Wild

Yafu Li, Qintong Li, Leyang Cui, Wei Bi, Zhilin Wang, Longyue Wang, Linyi Yang, Shuming Shi, Yue Zhang

2023-05-22Text Generation Story Generation Face Swapping Binary text classification Text Detection

Abstract

Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective AI-generated text detection to mitigate risks like the spread of fake news and plagiarism. Existing research has been constrained by evaluating detection methods on specific domains or particular language models. In practical scenarios, however, the detector faces texts from various domains or LLMs without knowing their sources. To this end, we build a comprehensive testbed by gathering texts from diverse human writings and texts generated by different LLMs. Empirical results show challenges in distinguishing machine-generated texts from human-authored ones across various scenarios, especially out-of-distribution. These challenges are due to the decreasing linguistic distinctions between the two sources. Despite challenges, the top-performing detector can identify 86.54% out-of-domain texts generated by a new LLM, indicating the feasibility for application scenarios. We release our resources at https://github.com/yafuly/MAGE.

Results

Task	Dataset	Metric	Value	Model
Binary text classification	MAGE (Arbitrary-domains & Arbitrary-models)	Average Recall	0.9053	Longformer

Related Papers

Making Language Model a Hierarchical Classifier and Generator2025-07-17 SHIELD: A Secure and Highly Enhanced Integrated Learning for Robust Deepfake Detection against Adversarial Attacks2025-07-17 Mitigating Object Hallucinations via Sentence-Level Early Intervention2025-07-16 MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM2025-07-16 The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs2025-07-15 Seq vs Seq: An Open Suite of Paired Encoders and Decoders2025-07-15 Hashed Watermark as a Filter: Defeating Forging and Overwriting Attacks in Weight-based Neural Network Watermarking2025-07-15 Exploiting Leaderboards for Large-Scale Distribution of Malicious Models2025-07-11