TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/GigaCheck: Detecting LLM-generated Content

GigaCheck: Detecting LLM-generated Content

Irina Tolstykh, Aleksandra Tsybina, Sergey Yakubson, Aleksandr Gordeev, Vladimir Dokholyan, Maksim Kuprashevich

2024-10-31Text ClassificationLLM-generated Text DetectionBoundary DetectionBinary text classificationText Detection
PaperPDF

Abstract

With the increasing quality and spread of LLM-based assistants, the amount of LLM-generated content is growing rapidly. In many cases and tasks, such texts are already indistinguishable from those written by humans, and the quality of generation tends to only increase. At the same time, detection methods are developing more slowly, making it challenging to prevent misuse of generative AI technologies. In this work, we investigate the task of generated text detection by proposing the GigaCheck. Our research explores two approaches: (i) distinguishing human-written texts from LLM-generated ones, and (ii) detecting LLM-generated intervals in Human-Machine collaborative texts. For the first task, our approach utilizes a general-purpose LLM, leveraging its extensive language abilities to fine-tune efficiently for the downstream task of LLM-generated text detection, achieving high performance even with limited data. For the second task, we propose a novel approach that combines computer vision and natural language processing techniques. Specifically, we use a fine-tuned general-purpose LLM in conjunction with a DETR-like detection model, adapted from computer vision, to localize AI-generated intervals within text. We evaluate the GigaCheck on five classification datasets with English texts and three datasets designed for Human-Machine collaborative text analysis. Our results demonstrate that GigaCheck outperforms previous methods, even in out-of-distribution settings, establishing a strong baseline across all datasets.

Results

TaskDatasetMetricValueModel
Boundary DetectionRoFT-chatgptAccuracy (%)67.65GigaCheck (DN-DAB-DETR)
Boundary DetectionRoFT-chatgptMSE1.03GigaCheck (DN-DAB-DETR)
Boundary DetectionRoFTAccuracy (%)64.63GigaCheck (DN-DAB-DETR)
Boundary DetectionRoFTMSE1.51GigaCheck (DN-DAB-DETR)
Boundary DetectionCoAuthorCohen’s Kappa score0.4158GigaCheck (Mistral-7B-v0.3)
Boundary DetectionCoAuthorCohen’s Kappa score0.1885GigaCheck (DN-DAB-DETR)
Boundary DetectionTriBERT (in-domain)F1@30.646GigaCheck (DN-DAB-DETR)
Binary text classificationTURINGBENCH (Turing Test, FAIR_wmt20)F1 score0.9966GigaCheck (Mistral-7B)
Binary text classificationTURINGBENCH (Turing Test, GPT-3)F1 score0.9709GigaCheck (Mistral-7B)
Binary text classificationMAGE (Arbitrary-domains & Arbitrary-models)Average Recall0.9611GigaCheck (Mistral-7B)
Binary text classificationTweepFakeAccuracy (%)94.3GigaCheck (Mistral-7B)
Binary text classificationTweepFakeF1 score0.942GigaCheck (Mistral-7B)
Binary text classificationGhostbuster (All Domains)F1 score1GigaCheck (Mistral-7B)
Binary text classificationMixSet (Binary)F1 score0.99GigaCheck (Mistral-7B)

Related Papers

Making Language Model a Hierarchical Classifier and Generator2025-07-17SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation2025-07-16GNN-CNN: An Efficient Hybrid Model of Convolutional and Graph Neural Networks for Text Representation2025-07-10Design and Implementation of an OCR-Powered Pipeline for Table Extraction from Invoices2025-07-09AI Generated Text Detection Using Instruction Fine-tuned Large Language and Transformer-Based Models2025-07-07The Trilemma of Truth in Large Language Models2025-06-30Robustness of Misinformation Classification Systems to Adversarial Examples Through BeamAttack2025-06-30Perspectives in Play: A Multi-Perspective Approach for More Inclusive NLP Systems2025-06-25