TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/RAFT: A Real-World Few-Shot Text Classification Benchmark

RAFT: A Real-World Few-Shot Text Classification Benchmark

Neel Alex, Eli Lifland, Lewis Tunstall, Abhishek Thakur, Pegah Maham, C. Jess Riedel, Emmie Hine, Carolyn Ashurst, Paul Sedille, Alexis Carlier, Michael Noetel, Andreas Stuhlmüller

2021-09-28Text ClassificationFew-Shot Learningtext-classificationFew-Shot Text ClassificationClassification
PaperPDFCode(official)

Abstract

Large pre-trained language models have shown promise for few-shot learning, completing text-based tasks given only a few task-specific examples. Will models soon solve classification tasks that have so far been reserved for human research assistants? Existing benchmarks are not designed to measure progress in applied settings, and so don't directly answer this question. The RAFT benchmark (Real-world Annotated Few-shot Tasks) focuses on naturally occurring tasks and uses an evaluation setup that mirrors deployment. Baseline evaluations on RAFT reveal areas current techniques struggle with: reasoning over long texts and tasks with many classes. Human baselines show that some classification tasks are difficult for non-expert humans, reflecting that real-world value sometimes depends on domain expertise. Yet even non-expert human baseline F1 scores exceed GPT-3 by an average of 0.11. The RAFT datasets and leaderboard will track which model improvements translate into real-world benefits at https://raft.elicit.org .

Results

TaskDatasetMetricValueModel
Text ClassificationRAFT Over0.917Human (crowdsourced)
Text ClassificationRAFTADE0.83Human (crowdsourced)
Text ClassificationRAFTAvg0.735Human (crowdsourced)
Text ClassificationRAFTB770.607Human (crowdsourced)
Text ClassificationRAFTNIS0.857Human (crowdsourced)
Text ClassificationRAFTOSE0.646Human (crowdsourced)
Text ClassificationRAFTSOT0.908Human (crowdsourced)
Text ClassificationRAFTSRI0.468Human (crowdsourced)
Text ClassificationRAFTTAI0.609Human (crowdsourced)
Text ClassificationRAFTTC0.897Human (crowdsourced)
Text ClassificationRAFTTEH0.722Human (crowdsourced)
Text ClassificationRAFTToS0.627Human (crowdsourced)
Text ClassificationRAFT Over0.937GPT-3
Text ClassificationRAFTADE0.686GPT-3
Text ClassificationRAFTAvg0.627GPT-3
Text ClassificationRAFTB770.299GPT-3
Text ClassificationRAFTNIS0.679GPT-3
Text ClassificationRAFTOSE0.431GPT-3
Text ClassificationRAFTSOT0.769GPT-3
Text ClassificationRAFTSRI0.516GPT-3
Text ClassificationRAFTTAI0.656GPT-3
Text ClassificationRAFTTC0.821GPT-3
Text ClassificationRAFTTEH0.526GPT-3
Text ClassificationRAFTToS0.574GPT-3
Text ClassificationRAFT Over0.838AdaBoost
Text ClassificationRAFTADE0.543AdaBoost
Text ClassificationRAFTAvg0.514AdaBoost
Text ClassificationRAFTB770.023AdaBoost
Text ClassificationRAFTNIS0.626AdaBoost
Text ClassificationRAFTOSE0.475AdaBoost
Text ClassificationRAFTSOT0.455AdaBoost
Text ClassificationRAFTSRI0.506AdaBoost
Text ClassificationRAFTTAI0.556AdaBoost
Text ClassificationRAFTTC0.625AdaBoost
Text ClassificationRAFTTEH0.443AdaBoost
Text ClassificationRAFTToS0.56AdaBoost
Text ClassificationRAFT Over0.681GPT-Neo
Text ClassificationRAFTADE0.452GPT-Neo
Text ClassificationRAFTAvg0.481GPT-Neo
Text ClassificationRAFTB770.149GPT-Neo
Text ClassificationRAFTNIS0.408GPT-Neo
Text ClassificationRAFTOSE0.343GPT-Neo
Text ClassificationRAFTSOT0.406GPT-Neo
Text ClassificationRAFTSRI0.493GPT-Neo
Text ClassificationRAFTTAI0.605GPT-Neo
Text ClassificationRAFTTC0.636GPT-Neo
Text ClassificationRAFTTEH0.554GPT-Neo
Text ClassificationRAFTToS0.565GPT-Neo
Text ClassificationRAFT Over0.498GPT-2
Text ClassificationRAFTADE0.6GPT-2
Text ClassificationRAFTAvg0.458GPT-2
Text ClassificationRAFTB770.121GPT-2
Text ClassificationRAFTNIS0.561GPT-2
Text ClassificationRAFTOSE0.245GPT-2
Text ClassificationRAFTSOT0.38GPT-2
Text ClassificationRAFTSRI0.492GPT-2
Text ClassificationRAFTTAI0.612GPT-2
Text ClassificationRAFTTC0.723GPT-2
Text ClassificationRAFTTEH0.311GPT-2
Text ClassificationRAFTToS0.498GPT-2
Text ClassificationRAFT Over0.462BART MNLI zero-shot
Text ClassificationRAFTADE0.234BART MNLI zero-shot
Text ClassificationRAFTAvg0.382BART MNLI zero-shot
Text ClassificationRAFTB770.332BART MNLI zero-shot
Text ClassificationRAFTNIS0.615BART MNLI zero-shot
Text ClassificationRAFTOSE0.36BART MNLI zero-shot
Text ClassificationRAFTSOT0.644BART MNLI zero-shot
Text ClassificationRAFTSRI0.026BART MNLI zero-shot
Text ClassificationRAFTTAI0.469BART MNLI zero-shot
Text ClassificationRAFTTC0.4BART MNLI zero-shot
Text ClassificationRAFTTEH0.543BART MNLI zero-shot
Text ClassificationRAFTToS0.122BART MNLI zero-shot
Text ClassificationRAFT Over0.337Plurality-class
Text ClassificationRAFTADE0.446Plurality-class
Text ClassificationRAFTAvg0.331Plurality-class
Text ClassificationRAFTNIS0.353Plurality-class
Text ClassificationRAFTOSE0.164Plurality-class
Text ClassificationRAFTSOT0.271Plurality-class
Text ClassificationRAFTSRI0.493Plurality-class
Text ClassificationRAFTTAI0.344Plurality-class
Text ClassificationRAFTTC0.391Plurality-class
Text ClassificationRAFTTEH0.366Plurality-class
Text ClassificationRAFTToS0.471Plurality-class
Text ClassificationRAFT Over0.378GPT-3 zero-shot
Text ClassificationRAFTADE0.163GPT-3 zero-shot
Text ClassificationRAFTAvg0.292GPT-3 zero-shot
Text ClassificationRAFTNIS0.572GPT-3 zero-shot
Text ClassificationRAFTOSE0.323GPT-3 zero-shot
Text ClassificationRAFTSOT0.628GPT-3 zero-shot
Text ClassificationRAFTSRI0.027GPT-3 zero-shot
Text ClassificationRAFTTAI0.362GPT-3 zero-shot
Text ClassificationRAFTTC0.29GPT-3 zero-shot
Text ClassificationRAFTTEH0.303GPT-3 zero-shot
Text ClassificationRAFTToS0.164GPT-3 zero-shot
Few-Shot Text ClassificationRAFT Over0.917Human (crowdsourced)
Few-Shot Text ClassificationRAFTADE0.83Human (crowdsourced)
Few-Shot Text ClassificationRAFTAvg0.735Human (crowdsourced)
Few-Shot Text ClassificationRAFTB770.607Human (crowdsourced)
Few-Shot Text ClassificationRAFTNIS0.857Human (crowdsourced)
Few-Shot Text ClassificationRAFTOSE0.646Human (crowdsourced)
Few-Shot Text ClassificationRAFTSOT0.908Human (crowdsourced)
Few-Shot Text ClassificationRAFTSRI0.468Human (crowdsourced)
Few-Shot Text ClassificationRAFTTAI0.609Human (crowdsourced)
Few-Shot Text ClassificationRAFTTC0.897Human (crowdsourced)
Few-Shot Text ClassificationRAFTTEH0.722Human (crowdsourced)
Few-Shot Text ClassificationRAFTToS0.627Human (crowdsourced)
Few-Shot Text ClassificationRAFT Over0.937GPT-3
Few-Shot Text ClassificationRAFTADE0.686GPT-3
Few-Shot Text ClassificationRAFTAvg0.627GPT-3
Few-Shot Text ClassificationRAFTB770.299GPT-3
Few-Shot Text ClassificationRAFTNIS0.679GPT-3
Few-Shot Text ClassificationRAFTOSE0.431GPT-3
Few-Shot Text ClassificationRAFTSOT0.769GPT-3
Few-Shot Text ClassificationRAFTSRI0.516GPT-3
Few-Shot Text ClassificationRAFTTAI0.656GPT-3
Few-Shot Text ClassificationRAFTTC0.821GPT-3
Few-Shot Text ClassificationRAFTTEH0.526GPT-3
Few-Shot Text ClassificationRAFTToS0.574GPT-3
Few-Shot Text ClassificationRAFT Over0.838AdaBoost
Few-Shot Text ClassificationRAFTADE0.543AdaBoost
Few-Shot Text ClassificationRAFTAvg0.514AdaBoost
Few-Shot Text ClassificationRAFTB770.023AdaBoost
Few-Shot Text ClassificationRAFTNIS0.626AdaBoost
Few-Shot Text ClassificationRAFTOSE0.475AdaBoost
Few-Shot Text ClassificationRAFTSOT0.455AdaBoost
Few-Shot Text ClassificationRAFTSRI0.506AdaBoost
Few-Shot Text ClassificationRAFTTAI0.556AdaBoost
Few-Shot Text ClassificationRAFTTC0.625AdaBoost
Few-Shot Text ClassificationRAFTTEH0.443AdaBoost
Few-Shot Text ClassificationRAFTToS0.56AdaBoost
Few-Shot Text ClassificationRAFT Over0.681GPT-Neo
Few-Shot Text ClassificationRAFTADE0.452GPT-Neo
Few-Shot Text ClassificationRAFTAvg0.481GPT-Neo
Few-Shot Text ClassificationRAFTB770.149GPT-Neo
Few-Shot Text ClassificationRAFTNIS0.408GPT-Neo
Few-Shot Text ClassificationRAFTOSE0.343GPT-Neo
Few-Shot Text ClassificationRAFTSOT0.406GPT-Neo
Few-Shot Text ClassificationRAFTSRI0.493GPT-Neo
Few-Shot Text ClassificationRAFTTAI0.605GPT-Neo
Few-Shot Text ClassificationRAFTTC0.636GPT-Neo
Few-Shot Text ClassificationRAFTTEH0.554GPT-Neo
Few-Shot Text ClassificationRAFTToS0.565GPT-Neo
Few-Shot Text ClassificationRAFT Over0.498GPT-2
Few-Shot Text ClassificationRAFTADE0.6GPT-2
Few-Shot Text ClassificationRAFTAvg0.458GPT-2
Few-Shot Text ClassificationRAFTB770.121GPT-2
Few-Shot Text ClassificationRAFTNIS0.561GPT-2
Few-Shot Text ClassificationRAFTOSE0.245GPT-2
Few-Shot Text ClassificationRAFTSOT0.38GPT-2
Few-Shot Text ClassificationRAFTSRI0.492GPT-2
Few-Shot Text ClassificationRAFTTAI0.612GPT-2
Few-Shot Text ClassificationRAFTTC0.723GPT-2
Few-Shot Text ClassificationRAFTTEH0.311GPT-2
Few-Shot Text ClassificationRAFTToS0.498GPT-2
Few-Shot Text ClassificationRAFT Over0.462BART MNLI zero-shot
Few-Shot Text ClassificationRAFTADE0.234BART MNLI zero-shot
Few-Shot Text ClassificationRAFTAvg0.382BART MNLI zero-shot
Few-Shot Text ClassificationRAFTB770.332BART MNLI zero-shot
Few-Shot Text ClassificationRAFTNIS0.615BART MNLI zero-shot
Few-Shot Text ClassificationRAFTOSE0.36BART MNLI zero-shot
Few-Shot Text ClassificationRAFTSOT0.644BART MNLI zero-shot
Few-Shot Text ClassificationRAFTSRI0.026BART MNLI zero-shot
Few-Shot Text ClassificationRAFTTAI0.469BART MNLI zero-shot
Few-Shot Text ClassificationRAFTTC0.4BART MNLI zero-shot
Few-Shot Text ClassificationRAFTTEH0.543BART MNLI zero-shot
Few-Shot Text ClassificationRAFTToS0.122BART MNLI zero-shot
Few-Shot Text ClassificationRAFT Over0.337Plurality-class
Few-Shot Text ClassificationRAFTADE0.446Plurality-class
Few-Shot Text ClassificationRAFTAvg0.331Plurality-class
Few-Shot Text ClassificationRAFTNIS0.353Plurality-class
Few-Shot Text ClassificationRAFTOSE0.164Plurality-class
Few-Shot Text ClassificationRAFTSOT0.271Plurality-class
Few-Shot Text ClassificationRAFTSRI0.493Plurality-class
Few-Shot Text ClassificationRAFTTAI0.344Plurality-class
Few-Shot Text ClassificationRAFTTC0.391Plurality-class
Few-Shot Text ClassificationRAFTTEH0.366Plurality-class
Few-Shot Text ClassificationRAFTToS0.471Plurality-class
Few-Shot Text ClassificationRAFT Over0.378GPT-3 zero-shot
Few-Shot Text ClassificationRAFTADE0.163GPT-3 zero-shot
Few-Shot Text ClassificationRAFTAvg0.292GPT-3 zero-shot
Few-Shot Text ClassificationRAFTNIS0.572GPT-3 zero-shot
Few-Shot Text ClassificationRAFTOSE0.323GPT-3 zero-shot
Few-Shot Text ClassificationRAFTSOT0.628GPT-3 zero-shot
Few-Shot Text ClassificationRAFTSRI0.027GPT-3 zero-shot
Few-Shot Text ClassificationRAFTTAI0.362GPT-3 zero-shot
Few-Shot Text ClassificationRAFTTC0.29GPT-3 zero-shot
Few-Shot Text ClassificationRAFTTEH0.303GPT-3 zero-shot
Few-Shot Text ClassificationRAFTToS0.164GPT-3 zero-shot
ClassificationRAFT Over0.917Human (crowdsourced)
ClassificationRAFTADE0.83Human (crowdsourced)
ClassificationRAFTAvg0.735Human (crowdsourced)
ClassificationRAFTB770.607Human (crowdsourced)
ClassificationRAFTNIS0.857Human (crowdsourced)
ClassificationRAFTOSE0.646Human (crowdsourced)
ClassificationRAFTSOT0.908Human (crowdsourced)
ClassificationRAFTSRI0.468Human (crowdsourced)
ClassificationRAFTTAI0.609Human (crowdsourced)
ClassificationRAFTTC0.897Human (crowdsourced)
ClassificationRAFTTEH0.722Human (crowdsourced)
ClassificationRAFTToS0.627Human (crowdsourced)
ClassificationRAFT Over0.937GPT-3
ClassificationRAFTADE0.686GPT-3
ClassificationRAFTAvg0.627GPT-3
ClassificationRAFTB770.299GPT-3
ClassificationRAFTNIS0.679GPT-3
ClassificationRAFTOSE0.431GPT-3
ClassificationRAFTSOT0.769GPT-3
ClassificationRAFTSRI0.516GPT-3
ClassificationRAFTTAI0.656GPT-3
ClassificationRAFTTC0.821GPT-3
ClassificationRAFTTEH0.526GPT-3
ClassificationRAFTToS0.574GPT-3
ClassificationRAFT Over0.838AdaBoost
ClassificationRAFTADE0.543AdaBoost
ClassificationRAFTAvg0.514AdaBoost
ClassificationRAFTB770.023AdaBoost
ClassificationRAFTNIS0.626AdaBoost
ClassificationRAFTOSE0.475AdaBoost
ClassificationRAFTSOT0.455AdaBoost
ClassificationRAFTSRI0.506AdaBoost
ClassificationRAFTTAI0.556AdaBoost
ClassificationRAFTTC0.625AdaBoost
ClassificationRAFTTEH0.443AdaBoost
ClassificationRAFTToS0.56AdaBoost
ClassificationRAFT Over0.681GPT-Neo
ClassificationRAFTADE0.452GPT-Neo
ClassificationRAFTAvg0.481GPT-Neo
ClassificationRAFTB770.149GPT-Neo
ClassificationRAFTNIS0.408GPT-Neo
ClassificationRAFTOSE0.343GPT-Neo
ClassificationRAFTSOT0.406GPT-Neo
ClassificationRAFTSRI0.493GPT-Neo
ClassificationRAFTTAI0.605GPT-Neo
ClassificationRAFTTC0.636GPT-Neo
ClassificationRAFTTEH0.554GPT-Neo
ClassificationRAFTToS0.565GPT-Neo
ClassificationRAFT Over0.498GPT-2
ClassificationRAFTADE0.6GPT-2
ClassificationRAFTAvg0.458GPT-2
ClassificationRAFTB770.121GPT-2
ClassificationRAFTNIS0.561GPT-2
ClassificationRAFTOSE0.245GPT-2
ClassificationRAFTSOT0.38GPT-2
ClassificationRAFTSRI0.492GPT-2
ClassificationRAFTTAI0.612GPT-2
ClassificationRAFTTC0.723GPT-2
ClassificationRAFTTEH0.311GPT-2
ClassificationRAFTToS0.498GPT-2
ClassificationRAFT Over0.462BART MNLI zero-shot
ClassificationRAFTADE0.234BART MNLI zero-shot
ClassificationRAFTAvg0.382BART MNLI zero-shot
ClassificationRAFTB770.332BART MNLI zero-shot
ClassificationRAFTNIS0.615BART MNLI zero-shot
ClassificationRAFTOSE0.36BART MNLI zero-shot
ClassificationRAFTSOT0.644BART MNLI zero-shot
ClassificationRAFTSRI0.026BART MNLI zero-shot
ClassificationRAFTTAI0.469BART MNLI zero-shot
ClassificationRAFTTC0.4BART MNLI zero-shot
ClassificationRAFTTEH0.543BART MNLI zero-shot
ClassificationRAFTToS0.122BART MNLI zero-shot
ClassificationRAFT Over0.337Plurality-class
ClassificationRAFTADE0.446Plurality-class
ClassificationRAFTAvg0.331Plurality-class
ClassificationRAFTNIS0.353Plurality-class
ClassificationRAFTOSE0.164Plurality-class
ClassificationRAFTSOT0.271Plurality-class
ClassificationRAFTSRI0.493Plurality-class
ClassificationRAFTTAI0.344Plurality-class
ClassificationRAFTTC0.391Plurality-class
ClassificationRAFTTEH0.366Plurality-class
ClassificationRAFTToS0.471Plurality-class
ClassificationRAFT Over0.378GPT-3 zero-shot
ClassificationRAFTADE0.163GPT-3 zero-shot
ClassificationRAFTAvg0.292GPT-3 zero-shot
ClassificationRAFTNIS0.572GPT-3 zero-shot
ClassificationRAFTOSE0.323GPT-3 zero-shot
ClassificationRAFTSOT0.628GPT-3 zero-shot
ClassificationRAFTSRI0.027GPT-3 zero-shot
ClassificationRAFTTAI0.362GPT-3 zero-shot
ClassificationRAFTTC0.29GPT-3 zero-shot
ClassificationRAFTTEH0.303GPT-3 zero-shot
ClassificationRAFTToS0.164GPT-3 zero-shot

Related Papers

Making Language Model a Hierarchical Classifier and Generator2025-07-17GLAD: Generalizable Tuning for Vision-Language Models2025-07-17Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16Safeguarding Federated Learning-based Road Condition Classification2025-07-16AI-Enhanced Pediatric Pneumonia Detection: A CNN-Based Approach Using Data Augmentation and Generative Adversarial Networks (GANs)2025-07-13GNN-CNN: An Efficient Hybrid Model of Convolutional and Graph Neural Networks for Text Representation2025-07-10Doodle Your Keypoints: Sketch-Based Few-Shot Keypoint Detection2025-07-10