Jingbiao Mei, Jinghong Chen, Guangyu Yang, Weizhe Lin, Bill Byrne
Hateful memes have become a significant concern on the Internet, necessitating robust automated detection systems. While LMMs have shown promise in hateful meme detection, they face notable challenges like sub-optimal performance and limited out-of-domain generalization capabilities. Recent studies further reveal the limitations of both SFT and in-context learning when applied to LMMs in this setting. To address these issues, we propose a robust adaptation framework for hateful meme detection that enhances in-domain accuracy and cross-domain generalization while preserving the general vision-language capabilities of LMMs. Experiments on six meme classification datasets show that our approach achieves state-of-the-art performance, outperforming larger agentic systems. Moreover, our method generates higher-quality rationales for explaining hateful content compared to standard SFT, enhancing model interpretability.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Meme Classification | Hateful Memes | Accuracy | 0.821 | RA-HMD (Qwen2-VL-7B) |
| Meme Classification | Hateful Memes | ROC-AUC | 0.911 | RA-HMD (Qwen2-VL-7B) |
| Meme Classification | Hateful Memes | Accuracy | 0.809 | RA-HMD (LLaVA-1.5-7B) |
| Meme Classification | Hateful Memes | ROC-AUC | 0.897 | RA-HMD (LLaVA-1.5-7B) |
| Meme Classification | Hateful Memes | Accuracy | 0.791 | RA-HMD (Qwen2-VL-2B) |
| Meme Classification | Hateful Memes | ROC-AUC | 0.884 | RA-HMD (Qwen2-VL-2B) |
| Meme Classification | MultiOFF | Accuracy | 71.1 | RA-HMD (Qwen2-VL-7B) |
| Meme Classification | MultiOFF | F1 | 64.8 | RA-HMD (Qwen2-VL-7B) |
| Meme Classification | Hateful Memes | AUROC | 91.1 | RA-HMD (Qwen2-VL-7B) |
| Meme Classification | HarMeme | AUROC | 93.2 | RA-HMD (Qwen2VL-7B) |
| Meme Classification | HarMeme | Accuracy | 88.1 | RA-HMD (Qwen2VL-7B) |
| Meme Classification | HarMeme | AUROC | 92.9 | RA-HMD (Qwen2VL-2B) |
| Meme Classification | HarMeme | Accuracy | 87.7 | RA-HMD (Qwen2VL-2B) |
| Meme Classification | Harm-P | Accuracy | 91.6 | RA-HMD (Qwen2-VL-7B) |
| Meme Classification | Harm-P | F1 | 91.1 | RA-HMD (Qwen2-VL-7B) |
| Meme Classification | PrideMM | Accuracy | 78.1 | RA-HMD (Qwen2-VL-7B) |
| Meme Classification | PrideMM | F1 | 78.4 | RA-HMD (Qwen2-VL-7B) |
| Meme Classification | PrideMM | Accuracy | 76 | RA-HMD (Qwen2-VL-2B) |
| Meme Classification | PrideMM | F1 | 76.7 | RA-HMD (Qwen2-VL-2B) |