Corentin Duchene, Henri Jamet, Pierre Guillaume, Reda Dehak
Toxic comment detection on social media has proven to be essential for content moderation. This paper compares a wide set of different models on a highly skewed multi-label hate speech dataset. We consider inference time and several metrics to measure performance and bias in our comparison. We show that all BERTs have similar performance regardless of the size, optimizations or language used to pre-train the models. RNNs are much faster at inference than any of the BERT. BiLSTM remains a good compromise between performance and inference time. RoBERTa with Focal Loss offers the best performance on biases and AUROC. However, DistilBERT combines both good AUROC and a low inference time. All models are affected by the bias of associating identities. BERT, RNN, and XLNet are less sensitive than the CNN and Compact Convolutional Transformers.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Text Classification | Civil Comments | AUROC | 0.9818 | RoBERTa Focal Loss |
| Text Classification | Civil Comments | GMB BNSP | 0.9581 | RoBERTa Focal Loss |
| Text Classification | Civil Comments | GMB BPSN | 0.901 | RoBERTa Focal Loss |
| Text Classification | Civil Comments | GMB Subgroup | 0.8807 | RoBERTa Focal Loss |
| Text Classification | Civil Comments | Macro F1 | 0.4648 | RoBERTa Focal Loss |
| Text Classification | Civil Comments | Micro F1 | 0.5524 | RoBERTa Focal Loss |
| Text Classification | Civil Comments | Precision | 0.4017 | RoBERTa Focal Loss |
| Text Classification | Civil Comments | Recall | 0.8839 | RoBERTa Focal Loss |
| Text Classification | Civil Comments | AUROC | 0.979 | AlBERT |
| Text Classification | Civil Comments | GMB BNSP | 0.9499 | AlBERT |
| Text Classification | Civil Comments | GMB BPSN | 0.8982 | AlBERT |
| Text Classification | Civil Comments | GMB Subgroup | 0.8734 | AlBERT |
| Text Classification | Civil Comments | Macro F1 | 0.3541 | AlBERT |
| Text Classification | Civil Comments | Micro F1 | 0.4845 | AlBERT |
| Text Classification | Civil Comments | Precision | 0.3247 | AlBERT |
| Text Classification | Civil Comments | Recall | 0.9104 | AlBERT |
| Text Classification | Civil Comments | AUROC | 0.979 | BERTweet |
| Text Classification | Civil Comments | GMB BNSP | 0.9603 | BERTweet |
| Text Classification | Civil Comments | GMB BPSN | 0.8945 | BERTweet |
| Text Classification | Civil Comments | GMB Subgroup | 0.878 | BERTweet |
| Text Classification | Civil Comments | Macro F1 | 0.3612 | BERTweet |
| Text Classification | Civil Comments | Micro F1 | 0.4928 | BERTweet |
| Text Classification | Civil Comments | Precision | 0.3363 | BERTweet |
| Text Classification | Civil Comments | Recall | 0.9216 | BERTweet |
| Text Classification | Civil Comments | AUROC | 0.9791 | HateBERT |
| Text Classification | Civil Comments | GMB BNSP | 0.9589 | HateBERT |
| Text Classification | Civil Comments | GMB BPSN | 0.8915 | HateBERT |
| Text Classification | Civil Comments | GMB Subgroup | 0.8744 | HateBERT |
| Text Classification | Civil Comments | Macro F1 | 0.3679 | HateBERT |
| Text Classification | Civil Comments | Micro F1 | 0.4844 | HateBERT |
| Text Classification | Civil Comments | Precision | 0.3297 | HateBERT |
| Text Classification | Civil Comments | Recall | 0.9165 | HateBERT |
| Text Classification | Civil Comments | AUROC | 0.9813 | RoBERTa BCE |
| Text Classification | Civil Comments | GMB BNSP | 0.9616 | RoBERTa BCE |
| Text Classification | Civil Comments | GMB BPSN | 0.8901 | RoBERTa BCE |
| Text Classification | Civil Comments | GMB Subgroup | 0.88 | RoBERTa BCE |
| Text Classification | Civil Comments | Macro F1 | 0.4749 | RoBERTa BCE |
| Text Classification | Civil Comments | Micro F1 | 0.5359 | RoBERTa BCE |
| Text Classification | Civil Comments | Precision | 0.3836 | RoBERTa BCE |
| Text Classification | Civil Comments | Recall | 0.8891 | RoBERTa BCE |
| Text Classification | Civil Comments | GMB BPSN | 0.8859 | XLM RoBERTa |
| Text Classification | Civil Comments | Micro F1 | 0.468 | XLM RoBERTa |
| Text Classification | Civil Comments | Precision | 0.3135 | XLM RoBERTa |
| Text Classification | Civil Comments | Recall | 0.923 | XLM RoBERTa |
| Text Classification | Civil Comments | GMB BNSP | 0.9597 | XLNet |
| Text Classification | Civil Comments | GMB BPSN | 0.8834 | XLNet |
| Text Classification | Civil Comments | GMB Subgroup | 0.8689 | XLNet |
| Text Classification | Civil Comments | Macro F1 | 0.3336 | XLNet |
| Text Classification | Civil Comments | Micro F1 | 0.4586 | XLNet |
| Text Classification | Civil Comments | Precision | 0.3045 | XLNet |
| Text Classification | Civil Comments | Recall | 0.9254 | XLNet |
| Text Classification | Civil Comments | AUROC | 0.9804 | DistilBERT |
| Text Classification | Civil Comments | GMB BNSP | 0.9644 | DistilBERT |
| Text Classification | Civil Comments | GMB BPSN | 0.874 | DistilBERT |
| Text Classification | Civil Comments | GMB Subgroup | 0.8762 | DistilBERT |
| Text Classification | Civil Comments | Macro F1 | 0.3879 | DistilBERT |
| Text Classification | Civil Comments | Micro F1 | 0.5115 | DistilBERT |
| Text Classification | Civil Comments | Precision | 0.3572 | DistilBERT |
| Text Classification | Civil Comments | Recall | 0.9001 | DistilBERT |
| Text Classification | Civil Comments | GMB BPSN | 0.8616 | BiGRU |
| Text Classification | Civil Comments | AUROC | 0.966 | Unfreeze Glove ResNet 44 |
| Text Classification | Civil Comments | GMB BPSN | 0.8493 | Unfreeze Glove ResNet 44 |
| Text Classification | Civil Comments | GMB Subgroup | 0.8421 | Unfreeze Glove ResNet 44 |
| Text Classification | Civil Comments | Macro F1 | 0.4648 | Unfreeze Glove ResNet 44 |
| Text Classification | Civil Comments | Micro F1 | 0.5958 | Unfreeze Glove ResNet 44 |
| Text Classification | Civil Comments | Precision | 0.4835 | Unfreeze Glove ResNet 44 |
| Text Classification | Civil Comments | Recall | 0.7759 | Unfreeze Glove ResNet 44 |
| Text Classification | Civil Comments | AUROC | 0.9639 | Unfreeze Glove ResNet 56 |
| Text Classification | Civil Comments | GMB BPSN | 0.8445 | Unfreeze Glove ResNet 56 |
| Text Classification | Civil Comments | GMB Subgroup | 0.8487 | Unfreeze Glove ResNet 56 |
| Text Classification | Civil Comments | Macro F1 | 0.3778 | Unfreeze Glove ResNet 56 |
| Text Classification | Civil Comments | Recall | 0.8707 | Unfreeze Glove ResNet 56 |
| Text Classification | Civil Comments | AUROC | 0.9526 | Compact Convolutional Transformer (CCT) |
| Text Classification | Civil Comments | GMB BNSP | 0.9447 | Compact Convolutional Transformer (CCT) |
| Text Classification | Civil Comments | GMB BPSN | 0.8307 | Compact Convolutional Transformer (CCT) |
| Text Classification | Civil Comments | GMB Subgroup | 0.8133 | Compact Convolutional Transformer (CCT) |
| Text Classification | Civil Comments | Macro F1 | 0.3428 | Compact Convolutional Transformer (CCT) |
| Text Classification | Civil Comments | Micro F1 | 0.4874 | Compact Convolutional Transformer (CCT) |
| Text Classification | Civil Comments | Precision | 0.3507 | Compact Convolutional Transformer (CCT) |
| Text Classification | Civil Comments | Recall | 0.7983 | Compact Convolutional Transformer (CCT) |
| Text Classification | Civil Comments | GMB BPSN | 0.7876 | Freeze Glove ResNet 44 |
| Text Classification | Civil Comments | GMB Subgroup | 0.8219 | Freeze Glove ResNet 44 |
| Text Classification | Civil Comments | Macro F1 | 0.4189 | Freeze Glove ResNet 44 |
| Text Classification | Civil Comments | Micro F1 | 0.5591 | Freeze Glove ResNet 44 |
| Text Classification | Civil Comments | Precision | 0.4631 | Freeze Glove ResNet 44 |
| Text Classification | Civil Comments | Recall | 0.7053 | Freeze Glove ResNet 44 |
| Text Classification | Civil Comments | GMB Subgroup | 0.8636 | BiLSTM |
| Text Classification | Civil Comments | Micro F1 | 0.5115 | BiLSTM |
| Text Classification | Civil Comments | Precision | 0.3572 | BiLSTM |
| Classification | Civil Comments | AUROC | 0.9818 | RoBERTa Focal Loss |
| Classification | Civil Comments | GMB BNSP | 0.9581 | RoBERTa Focal Loss |
| Classification | Civil Comments | GMB BPSN | 0.901 | RoBERTa Focal Loss |
| Classification | Civil Comments | GMB Subgroup | 0.8807 | RoBERTa Focal Loss |
| Classification | Civil Comments | Macro F1 | 0.4648 | RoBERTa Focal Loss |
| Classification | Civil Comments | Micro F1 | 0.5524 | RoBERTa Focal Loss |
| Classification | Civil Comments | Precision | 0.4017 | RoBERTa Focal Loss |
| Classification | Civil Comments | Recall | 0.8839 | RoBERTa Focal Loss |
| Classification | Civil Comments | AUROC | 0.979 | AlBERT |
| Classification | Civil Comments | GMB BNSP | 0.9499 | AlBERT |
| Classification | Civil Comments | GMB BPSN | 0.8982 | AlBERT |
| Classification | Civil Comments | GMB Subgroup | 0.8734 | AlBERT |
| Classification | Civil Comments | Macro F1 | 0.3541 | AlBERT |
| Classification | Civil Comments | Micro F1 | 0.4845 | AlBERT |
| Classification | Civil Comments | Precision | 0.3247 | AlBERT |
| Classification | Civil Comments | Recall | 0.9104 | AlBERT |
| Classification | Civil Comments | AUROC | 0.979 | BERTweet |
| Classification | Civil Comments | GMB BNSP | 0.9603 | BERTweet |
| Classification | Civil Comments | GMB BPSN | 0.8945 | BERTweet |
| Classification | Civil Comments | GMB Subgroup | 0.878 | BERTweet |
| Classification | Civil Comments | Macro F1 | 0.3612 | BERTweet |
| Classification | Civil Comments | Micro F1 | 0.4928 | BERTweet |
| Classification | Civil Comments | Precision | 0.3363 | BERTweet |
| Classification | Civil Comments | Recall | 0.9216 | BERTweet |
| Classification | Civil Comments | AUROC | 0.9791 | HateBERT |
| Classification | Civil Comments | GMB BNSP | 0.9589 | HateBERT |
| Classification | Civil Comments | GMB BPSN | 0.8915 | HateBERT |
| Classification | Civil Comments | GMB Subgroup | 0.8744 | HateBERT |
| Classification | Civil Comments | Macro F1 | 0.3679 | HateBERT |
| Classification | Civil Comments | Micro F1 | 0.4844 | HateBERT |
| Classification | Civil Comments | Precision | 0.3297 | HateBERT |
| Classification | Civil Comments | Recall | 0.9165 | HateBERT |
| Classification | Civil Comments | AUROC | 0.9813 | RoBERTa BCE |
| Classification | Civil Comments | GMB BNSP | 0.9616 | RoBERTa BCE |
| Classification | Civil Comments | GMB BPSN | 0.8901 | RoBERTa BCE |
| Classification | Civil Comments | GMB Subgroup | 0.88 | RoBERTa BCE |
| Classification | Civil Comments | Macro F1 | 0.4749 | RoBERTa BCE |
| Classification | Civil Comments | Micro F1 | 0.5359 | RoBERTa BCE |
| Classification | Civil Comments | Precision | 0.3836 | RoBERTa BCE |
| Classification | Civil Comments | Recall | 0.8891 | RoBERTa BCE |
| Classification | Civil Comments | GMB BPSN | 0.8859 | XLM RoBERTa |
| Classification | Civil Comments | Micro F1 | 0.468 | XLM RoBERTa |
| Classification | Civil Comments | Precision | 0.3135 | XLM RoBERTa |
| Classification | Civil Comments | Recall | 0.923 | XLM RoBERTa |
| Classification | Civil Comments | GMB BNSP | 0.9597 | XLNet |
| Classification | Civil Comments | GMB BPSN | 0.8834 | XLNet |
| Classification | Civil Comments | GMB Subgroup | 0.8689 | XLNet |
| Classification | Civil Comments | Macro F1 | 0.3336 | XLNet |
| Classification | Civil Comments | Micro F1 | 0.4586 | XLNet |
| Classification | Civil Comments | Precision | 0.3045 | XLNet |
| Classification | Civil Comments | Recall | 0.9254 | XLNet |
| Classification | Civil Comments | AUROC | 0.9804 | DistilBERT |
| Classification | Civil Comments | GMB BNSP | 0.9644 | DistilBERT |
| Classification | Civil Comments | GMB BPSN | 0.874 | DistilBERT |
| Classification | Civil Comments | GMB Subgroup | 0.8762 | DistilBERT |
| Classification | Civil Comments | Macro F1 | 0.3879 | DistilBERT |
| Classification | Civil Comments | Micro F1 | 0.5115 | DistilBERT |
| Classification | Civil Comments | Precision | 0.3572 | DistilBERT |
| Classification | Civil Comments | Recall | 0.9001 | DistilBERT |
| Classification | Civil Comments | GMB BPSN | 0.8616 | BiGRU |
| Classification | Civil Comments | AUROC | 0.966 | Unfreeze Glove ResNet 44 |
| Classification | Civil Comments | GMB BPSN | 0.8493 | Unfreeze Glove ResNet 44 |
| Classification | Civil Comments | GMB Subgroup | 0.8421 | Unfreeze Glove ResNet 44 |
| Classification | Civil Comments | Macro F1 | 0.4648 | Unfreeze Glove ResNet 44 |
| Classification | Civil Comments | Micro F1 | 0.5958 | Unfreeze Glove ResNet 44 |
| Classification | Civil Comments | Precision | 0.4835 | Unfreeze Glove ResNet 44 |
| Classification | Civil Comments | Recall | 0.7759 | Unfreeze Glove ResNet 44 |
| Classification | Civil Comments | AUROC | 0.9639 | Unfreeze Glove ResNet 56 |
| Classification | Civil Comments | GMB BPSN | 0.8445 | Unfreeze Glove ResNet 56 |
| Classification | Civil Comments | GMB Subgroup | 0.8487 | Unfreeze Glove ResNet 56 |
| Classification | Civil Comments | Macro F1 | 0.3778 | Unfreeze Glove ResNet 56 |
| Classification | Civil Comments | Recall | 0.8707 | Unfreeze Glove ResNet 56 |
| Classification | Civil Comments | AUROC | 0.9526 | Compact Convolutional Transformer (CCT) |
| Classification | Civil Comments | GMB BNSP | 0.9447 | Compact Convolutional Transformer (CCT) |
| Classification | Civil Comments | GMB BPSN | 0.8307 | Compact Convolutional Transformer (CCT) |
| Classification | Civil Comments | GMB Subgroup | 0.8133 | Compact Convolutional Transformer (CCT) |
| Classification | Civil Comments | Macro F1 | 0.3428 | Compact Convolutional Transformer (CCT) |
| Classification | Civil Comments | Micro F1 | 0.4874 | Compact Convolutional Transformer (CCT) |
| Classification | Civil Comments | Precision | 0.3507 | Compact Convolutional Transformer (CCT) |
| Classification | Civil Comments | Recall | 0.7983 | Compact Convolutional Transformer (CCT) |
| Classification | Civil Comments | GMB BPSN | 0.7876 | Freeze Glove ResNet 44 |
| Classification | Civil Comments | GMB Subgroup | 0.8219 | Freeze Glove ResNet 44 |
| Classification | Civil Comments | Macro F1 | 0.4189 | Freeze Glove ResNet 44 |
| Classification | Civil Comments | Micro F1 | 0.5591 | Freeze Glove ResNet 44 |
| Classification | Civil Comments | Precision | 0.4631 | Freeze Glove ResNet 44 |
| Classification | Civil Comments | Recall | 0.7053 | Freeze Glove ResNet 44 |
| Classification | Civil Comments | GMB Subgroup | 0.8636 | BiLSTM |
| Classification | Civil Comments | Micro F1 | 0.5115 | BiLSTM |
| Classification | Civil Comments | Precision | 0.3572 | BiLSTM |