TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Balancing Methods for Multi-label Text Classification with...

Balancing Methods for Multi-label Text Classification with Long-Tailed Class Distribution

Yi Huang, Buse Giledereli, Abdullatif Köksal, Arzucan Özgür, Elif Ozkirimli

2021-09-10EMNLP 2021 11Text ClassificationDocument ClassificationMulti-Label Text Classification
PaperPDFCode(official)Code(official)

Abstract

Multi-label text classification is a challenging task because it requires capturing label dependencies. It becomes even more challenging when class distribution is long-tailed. Resampling and re-weighting are common approaches used for addressing the class imbalance problem, however, they are not effective when there is label dependency besides class imbalance because they result in oversampling of common labels. Here, we introduce the application of balancing loss functions for multi-label text classification. We perform experiments on a general domain dataset with 90 labels (Reuters-21578) and a domain-specific dataset from PubMed with 18211 labels. We find that a distribution-balanced loss function, which inherently addresses both the class imbalance and label linkage problems, outperforms commonly used loss functions. Distribution balancing methods have been successfully used in the image recognition field. Here, we show their effectiveness in natural language processing. Source code is available at https://github.com/Roche/BalancedLossNLP.

Results

TaskDatasetMetricValueModel
Multi-Label Text ClassificationReuters-21578Micro-F190.74CB-NTR
Multi-Label Text ClassificationReuters-21578Micro-F190.7NTR-FL
Multi-Label Text ClassificationReuters-21578Micro-F190.62DB
Text ClassificationReuters-21578Micro-F190.74CB-NTR
Text ClassificationReuters-21578Micro-F190.7NTR-FL
Text ClassificationReuters-21578Micro-F190.62DB
ClassificationReuters-21578Micro-F190.74CB-NTR
ClassificationReuters-21578Micro-F190.7NTR-FL
ClassificationReuters-21578Micro-F190.62DB

Related Papers

Making Language Model a Hierarchical Classifier and Generator2025-07-17GNN-CNN: An Efficient Hybrid Model of Convolutional and Graph Neural Networks for Text Representation2025-07-10The Trilemma of Truth in Large Language Models2025-06-30Robustness of Misinformation Classification Systems to Adversarial Examples Through BeamAttack2025-06-30Perspectives in Play: A Multi-Perspective Approach for More Inclusive NLP Systems2025-06-25Can Generated Images Serve as a Viable Modality for Text-Centric Multimodal Learning?2025-06-21SHREC and PHEONA: Using Large Language Models to Advance Next-Generation Computational Phenotyping2025-06-19Flick: Few Labels Text Classification using K-Aware Intermediate Learning in Multi-Task Low-Resource Languages2025-06-12