TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Sampling Bias in Deep Active Classification: An Empirical ...

Sampling Bias in Deep Active Classification: An Empirical Study

Ameya Prabhu, Charles Dognin, Maneesh Singh

2019-09-20IJCNLP 2019 11Text ClassificationActive Learningtext-classificationGeneral ClassificationClassification
PaperPDFCode(official)Code(official)

Abstract

The exploding cost and time needed for data labeling and model training are bottlenecks for training DNN models on large datasets. Identifying smaller representative data samples with strategies like active learning can help mitigate such bottlenecks. Previous works on active learning in NLP identify the problem of sampling bias in the samples acquired by uncertainty-based querying and develop costly approaches to address it. Using a large empirical study, we demonstrate that active set selection using the posterior entropy of deep models like FastText.zip (FTZ) is robust to sampling biases and to various algorithmic choices (query size and strategies) unlike that suggested by traditional literature. We also show that FTZ based query strategy produces sample sets similar to those from more sophisticated approaches (e.g ensemble networks). Finally, we show the effectiveness of the selected samples by creating tiny high-quality datasets, and utilizing them for fast and cheap training of large models. Based on the above, we propose a simple baseline for deep active text classification that outperforms the state-of-the-art. We expect the presented work to be useful and informative for dataset compression and for problems involving active, semi-supervised or online learning scenarios. Code and models are available at: https://github.com/drimpossible/Sampling-Bias-Active-Learning

Results

TaskDatasetMetricValueModel
Text ClassificationSogou NewsAccuracy97ULMFiT (Small data)
Text ClassificationDBpediaError0.8ULMFiT (Small data)
Text ClassificationAmazon-5Error35.9ULMFiT (Small data)
Text ClassificationAG NewsError6.3ULMFiT (Small data)
Text ClassificationYahoo! AnswersAccuracy74.3ULMFiT (Small data)
Text ClassificationAmazon-2Error3.9ULMFiT (Small data)
ClassificationSogou NewsAccuracy97ULMFiT (Small data)
ClassificationDBpediaError0.8ULMFiT (Small data)
ClassificationAmazon-5Error35.9ULMFiT (Small data)
ClassificationAG NewsError6.3ULMFiT (Small data)
ClassificationYahoo! AnswersAccuracy74.3ULMFiT (Small data)
ClassificationAmazon-2Error3.9ULMFiT (Small data)

Related Papers

Making Language Model a Hierarchical Classifier and Generator2025-07-17Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16Safeguarding Federated Learning-based Road Condition Classification2025-07-16A Risk-Aware Adaptive Robust MPC with Learned Uncertainty Quantification2025-07-15AI-Enhanced Pediatric Pneumonia Detection: A CNN-Based Approach Using Data Augmentation and Generative Adversarial Networks (GANs)2025-07-13GNN-CNN: An Efficient Hybrid Model of Convolutional and Graph Neural Networks for Text Representation2025-07-10CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization2025-07-08