TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/A benchmark for toxic comment classification on Civil Comm...

A benchmark for toxic comment classification on Civil Comments dataset

Corentin Duchene, Henri Jamet, Pierre Guillaume, Reda Dehak

2023-01-26Toxic Comment Classification
PaperPDFCode(official)

Abstract

Toxic comment detection on social media has proven to be essential for content moderation. This paper compares a wide set of different models on a highly skewed multi-label hate speech dataset. We consider inference time and several metrics to measure performance and bias in our comparison. We show that all BERTs have similar performance regardless of the size, optimizations or language used to pre-train the models. RNNs are much faster at inference than any of the BERT. BiLSTM remains a good compromise between performance and inference time. RoBERTa with Focal Loss offers the best performance on biases and AUROC. However, DistilBERT combines both good AUROC and a low inference time. All models are affected by the bias of associating identities. BERT, RNN, and XLNet are less sensitive than the CNN and Compact Convolutional Transformers.

Results

TaskDatasetMetricValueModel
Text ClassificationCivil CommentsAUROC0.9818RoBERTa Focal Loss
Text ClassificationCivil CommentsGMB BNSP0.9581RoBERTa Focal Loss
Text ClassificationCivil CommentsGMB BPSN0.901RoBERTa Focal Loss
Text ClassificationCivil CommentsGMB Subgroup0.8807RoBERTa Focal Loss
Text ClassificationCivil CommentsMacro F10.4648RoBERTa Focal Loss
Text ClassificationCivil CommentsMicro F10.5524RoBERTa Focal Loss
Text ClassificationCivil CommentsPrecision0.4017RoBERTa Focal Loss
Text ClassificationCivil CommentsRecall0.8839RoBERTa Focal Loss
Text ClassificationCivil CommentsAUROC0.979AlBERT
Text ClassificationCivil CommentsGMB BNSP0.9499AlBERT
Text ClassificationCivil CommentsGMB BPSN0.8982AlBERT
Text ClassificationCivil CommentsGMB Subgroup0.8734AlBERT
Text ClassificationCivil CommentsMacro F10.3541AlBERT
Text ClassificationCivil CommentsMicro F10.4845AlBERT
Text ClassificationCivil CommentsPrecision0.3247AlBERT
Text ClassificationCivil CommentsRecall0.9104AlBERT
Text ClassificationCivil CommentsAUROC0.979BERTweet
Text ClassificationCivil CommentsGMB BNSP0.9603BERTweet
Text ClassificationCivil CommentsGMB BPSN0.8945BERTweet
Text ClassificationCivil CommentsGMB Subgroup0.878BERTweet
Text ClassificationCivil CommentsMacro F10.3612BERTweet
Text ClassificationCivil CommentsMicro F10.4928BERTweet
Text ClassificationCivil CommentsPrecision0.3363BERTweet
Text ClassificationCivil CommentsRecall0.9216BERTweet
Text ClassificationCivil CommentsAUROC0.9791HateBERT
Text ClassificationCivil CommentsGMB BNSP0.9589HateBERT
Text ClassificationCivil CommentsGMB BPSN0.8915HateBERT
Text ClassificationCivil CommentsGMB Subgroup0.8744HateBERT
Text ClassificationCivil CommentsMacro F10.3679HateBERT
Text ClassificationCivil CommentsMicro F10.4844HateBERT
Text ClassificationCivil CommentsPrecision0.3297HateBERT
Text ClassificationCivil CommentsRecall0.9165HateBERT
Text ClassificationCivil CommentsAUROC0.9813RoBERTa BCE
Text ClassificationCivil CommentsGMB BNSP0.9616RoBERTa BCE
Text ClassificationCivil CommentsGMB BPSN0.8901RoBERTa BCE
Text ClassificationCivil CommentsGMB Subgroup0.88RoBERTa BCE
Text ClassificationCivil CommentsMacro F10.4749RoBERTa BCE
Text ClassificationCivil CommentsMicro F10.5359RoBERTa BCE
Text ClassificationCivil CommentsPrecision0.3836RoBERTa BCE
Text ClassificationCivil CommentsRecall0.8891RoBERTa BCE
Text ClassificationCivil CommentsGMB BPSN0.8859XLM RoBERTa
Text ClassificationCivil CommentsMicro F10.468XLM RoBERTa
Text ClassificationCivil CommentsPrecision0.3135XLM RoBERTa
Text ClassificationCivil CommentsRecall0.923XLM RoBERTa
Text ClassificationCivil CommentsGMB BNSP0.9597XLNet
Text ClassificationCivil CommentsGMB BPSN0.8834XLNet
Text ClassificationCivil CommentsGMB Subgroup0.8689XLNet
Text ClassificationCivil CommentsMacro F10.3336XLNet
Text ClassificationCivil CommentsMicro F10.4586XLNet
Text ClassificationCivil CommentsPrecision0.3045XLNet
Text ClassificationCivil CommentsRecall0.9254XLNet
Text ClassificationCivil CommentsAUROC0.9804DistilBERT
Text ClassificationCivil CommentsGMB BNSP0.9644DistilBERT
Text ClassificationCivil CommentsGMB BPSN0.874DistilBERT
Text ClassificationCivil CommentsGMB Subgroup0.8762DistilBERT
Text ClassificationCivil CommentsMacro F10.3879DistilBERT
Text ClassificationCivil CommentsMicro F10.5115DistilBERT
Text ClassificationCivil CommentsPrecision0.3572DistilBERT
Text ClassificationCivil CommentsRecall0.9001DistilBERT
Text ClassificationCivil CommentsGMB BPSN0.8616BiGRU
Text ClassificationCivil CommentsAUROC0.966Unfreeze Glove ResNet 44
Text ClassificationCivil CommentsGMB BPSN0.8493Unfreeze Glove ResNet 44
Text ClassificationCivil CommentsGMB Subgroup0.8421Unfreeze Glove ResNet 44
Text ClassificationCivil CommentsMacro F10.4648Unfreeze Glove ResNet 44
Text ClassificationCivil CommentsMicro F10.5958Unfreeze Glove ResNet 44
Text ClassificationCivil CommentsPrecision0.4835Unfreeze Glove ResNet 44
Text ClassificationCivil CommentsRecall0.7759Unfreeze Glove ResNet 44
Text ClassificationCivil CommentsAUROC0.9639Unfreeze Glove ResNet 56
Text ClassificationCivil CommentsGMB BPSN0.8445Unfreeze Glove ResNet 56
Text ClassificationCivil CommentsGMB Subgroup0.8487Unfreeze Glove ResNet 56
Text ClassificationCivil CommentsMacro F10.3778Unfreeze Glove ResNet 56
Text ClassificationCivil CommentsRecall0.8707Unfreeze Glove ResNet 56
Text ClassificationCivil CommentsAUROC0.9526Compact Convolutional Transformer (CCT)
Text ClassificationCivil CommentsGMB BNSP0.9447Compact Convolutional Transformer (CCT)
Text ClassificationCivil CommentsGMB BPSN0.8307Compact Convolutional Transformer (CCT)
Text ClassificationCivil CommentsGMB Subgroup0.8133Compact Convolutional Transformer (CCT)
Text ClassificationCivil CommentsMacro F10.3428Compact Convolutional Transformer (CCT)
Text ClassificationCivil CommentsMicro F10.4874Compact Convolutional Transformer (CCT)
Text ClassificationCivil CommentsPrecision0.3507Compact Convolutional Transformer (CCT)
Text ClassificationCivil CommentsRecall0.7983Compact Convolutional Transformer (CCT)
Text ClassificationCivil CommentsGMB BPSN0.7876Freeze Glove ResNet 44
Text ClassificationCivil CommentsGMB Subgroup0.8219Freeze Glove ResNet 44
Text ClassificationCivil CommentsMacro F10.4189Freeze Glove ResNet 44
Text ClassificationCivil CommentsMicro F10.5591Freeze Glove ResNet 44
Text ClassificationCivil CommentsPrecision0.4631Freeze Glove ResNet 44
Text ClassificationCivil CommentsRecall0.7053Freeze Glove ResNet 44
Text ClassificationCivil CommentsGMB Subgroup0.8636BiLSTM
Text ClassificationCivil CommentsMicro F10.5115BiLSTM
Text ClassificationCivil CommentsPrecision0.3572BiLSTM
ClassificationCivil CommentsAUROC0.9818RoBERTa Focal Loss
ClassificationCivil CommentsGMB BNSP0.9581RoBERTa Focal Loss
ClassificationCivil CommentsGMB BPSN0.901RoBERTa Focal Loss
ClassificationCivil CommentsGMB Subgroup0.8807RoBERTa Focal Loss
ClassificationCivil CommentsMacro F10.4648RoBERTa Focal Loss
ClassificationCivil CommentsMicro F10.5524RoBERTa Focal Loss
ClassificationCivil CommentsPrecision0.4017RoBERTa Focal Loss
ClassificationCivil CommentsRecall0.8839RoBERTa Focal Loss
ClassificationCivil CommentsAUROC0.979AlBERT
ClassificationCivil CommentsGMB BNSP0.9499AlBERT
ClassificationCivil CommentsGMB BPSN0.8982AlBERT
ClassificationCivil CommentsGMB Subgroup0.8734AlBERT
ClassificationCivil CommentsMacro F10.3541AlBERT
ClassificationCivil CommentsMicro F10.4845AlBERT
ClassificationCivil CommentsPrecision0.3247AlBERT
ClassificationCivil CommentsRecall0.9104AlBERT
ClassificationCivil CommentsAUROC0.979BERTweet
ClassificationCivil CommentsGMB BNSP0.9603BERTweet
ClassificationCivil CommentsGMB BPSN0.8945BERTweet
ClassificationCivil CommentsGMB Subgroup0.878BERTweet
ClassificationCivil CommentsMacro F10.3612BERTweet
ClassificationCivil CommentsMicro F10.4928BERTweet
ClassificationCivil CommentsPrecision0.3363BERTweet
ClassificationCivil CommentsRecall0.9216BERTweet
ClassificationCivil CommentsAUROC0.9791HateBERT
ClassificationCivil CommentsGMB BNSP0.9589HateBERT
ClassificationCivil CommentsGMB BPSN0.8915HateBERT
ClassificationCivil CommentsGMB Subgroup0.8744HateBERT
ClassificationCivil CommentsMacro F10.3679HateBERT
ClassificationCivil CommentsMicro F10.4844HateBERT
ClassificationCivil CommentsPrecision0.3297HateBERT
ClassificationCivil CommentsRecall0.9165HateBERT
ClassificationCivil CommentsAUROC0.9813RoBERTa BCE
ClassificationCivil CommentsGMB BNSP0.9616RoBERTa BCE
ClassificationCivil CommentsGMB BPSN0.8901RoBERTa BCE
ClassificationCivil CommentsGMB Subgroup0.88RoBERTa BCE
ClassificationCivil CommentsMacro F10.4749RoBERTa BCE
ClassificationCivil CommentsMicro F10.5359RoBERTa BCE
ClassificationCivil CommentsPrecision0.3836RoBERTa BCE
ClassificationCivil CommentsRecall0.8891RoBERTa BCE
ClassificationCivil CommentsGMB BPSN0.8859XLM RoBERTa
ClassificationCivil CommentsMicro F10.468XLM RoBERTa
ClassificationCivil CommentsPrecision0.3135XLM RoBERTa
ClassificationCivil CommentsRecall0.923XLM RoBERTa
ClassificationCivil CommentsGMB BNSP0.9597XLNet
ClassificationCivil CommentsGMB BPSN0.8834XLNet
ClassificationCivil CommentsGMB Subgroup0.8689XLNet
ClassificationCivil CommentsMacro F10.3336XLNet
ClassificationCivil CommentsMicro F10.4586XLNet
ClassificationCivil CommentsPrecision0.3045XLNet
ClassificationCivil CommentsRecall0.9254XLNet
ClassificationCivil CommentsAUROC0.9804DistilBERT
ClassificationCivil CommentsGMB BNSP0.9644DistilBERT
ClassificationCivil CommentsGMB BPSN0.874DistilBERT
ClassificationCivil CommentsGMB Subgroup0.8762DistilBERT
ClassificationCivil CommentsMacro F10.3879DistilBERT
ClassificationCivil CommentsMicro F10.5115DistilBERT
ClassificationCivil CommentsPrecision0.3572DistilBERT
ClassificationCivil CommentsRecall0.9001DistilBERT
ClassificationCivil CommentsGMB BPSN0.8616BiGRU
ClassificationCivil CommentsAUROC0.966Unfreeze Glove ResNet 44
ClassificationCivil CommentsGMB BPSN0.8493Unfreeze Glove ResNet 44
ClassificationCivil CommentsGMB Subgroup0.8421Unfreeze Glove ResNet 44
ClassificationCivil CommentsMacro F10.4648Unfreeze Glove ResNet 44
ClassificationCivil CommentsMicro F10.5958Unfreeze Glove ResNet 44
ClassificationCivil CommentsPrecision0.4835Unfreeze Glove ResNet 44
ClassificationCivil CommentsRecall0.7759Unfreeze Glove ResNet 44
ClassificationCivil CommentsAUROC0.9639Unfreeze Glove ResNet 56
ClassificationCivil CommentsGMB BPSN0.8445Unfreeze Glove ResNet 56
ClassificationCivil CommentsGMB Subgroup0.8487Unfreeze Glove ResNet 56
ClassificationCivil CommentsMacro F10.3778Unfreeze Glove ResNet 56
ClassificationCivil CommentsRecall0.8707Unfreeze Glove ResNet 56
ClassificationCivil CommentsAUROC0.9526Compact Convolutional Transformer (CCT)
ClassificationCivil CommentsGMB BNSP0.9447Compact Convolutional Transformer (CCT)
ClassificationCivil CommentsGMB BPSN0.8307Compact Convolutional Transformer (CCT)
ClassificationCivil CommentsGMB Subgroup0.8133Compact Convolutional Transformer (CCT)
ClassificationCivil CommentsMacro F10.3428Compact Convolutional Transformer (CCT)
ClassificationCivil CommentsMicro F10.4874Compact Convolutional Transformer (CCT)
ClassificationCivil CommentsPrecision0.3507Compact Convolutional Transformer (CCT)
ClassificationCivil CommentsRecall0.7983Compact Convolutional Transformer (CCT)
ClassificationCivil CommentsGMB BPSN0.7876Freeze Glove ResNet 44
ClassificationCivil CommentsGMB Subgroup0.8219Freeze Glove ResNet 44
ClassificationCivil CommentsMacro F10.4189Freeze Glove ResNet 44
ClassificationCivil CommentsMicro F10.5591Freeze Glove ResNet 44
ClassificationCivil CommentsPrecision0.4631Freeze Glove ResNet 44
ClassificationCivil CommentsRecall0.7053Freeze Glove ResNet 44
ClassificationCivil CommentsGMB Subgroup0.8636BiLSTM
ClassificationCivil CommentsMicro F10.5115BiLSTM
ClassificationCivil CommentsPrecision0.3572BiLSTM

Related Papers

Applying LLMs to Active Learning: Towards Cost-Efficient Cross-Task Text Classification without Manually Labeled Data2025-02-24NLPGuard: A Framework for Mitigating the Use of Protected Attributes by NLP Classifiers2024-07-01PyTorch Frame: A Modular Framework for Multi-Modal Tabular Learning2024-03-31Evaluating The Effectiveness of Capsule Neural Network in Toxic Comment Classification using Pre-trained BERT Embeddings2023-10-12CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing2023-05-19Beyond Toxic: Toxicity Detection Datasets are Not Enough for Brand Safety2023-03-27A New Generation of Perspective API: Efficient Multilingual Character-level Transformers2022-02-22A Survey of Toxic Comment Classification Methods2021-12-13