TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked ...

MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network Dataset

Dan Saattrup Nielsen, Ryan McConville

2022-02-23MisinformationNode Classification
PaperPDFCodeCodeCode(official)

Abstract

Misinformation is becoming increasingly prevalent on social media and in news articles. It has become so widespread that we require algorithmic assistance utilising machine learning to detect such content. Training these machine learning models require datasets of sufficient scale, diversity and quality. However, datasets in the field of automatic misinformation detection are predominantly monolingual, include a limited amount of modalities and are not of sufficient scale and quality. Addressing this, we develop a data collection and linking system (MuMiN-trawl), to build a public misinformation graph dataset (MuMiN), containing rich social media data (tweets, replies, users, images, articles, hashtags) spanning 21 million tweets belonging to 26 thousand Twitter threads, each of which have been semantically linked to 13 thousand fact-checked claims across dozens of topics, events and domains, in 41 different languages, spanning more than a decade. The dataset is made available as a heterogeneous graph via a Python package (mumin). We provide baseline results for two node classification tasks related to the veracity of a claim involving social media, and demonstrate that these are challenging tasks, with the highest macro-average F1-score being 62.55% and 61.45% for the two tasks, respectively. The MuMiN ecosystem is available at https://mumin-dataset.github.io/, including the data, documentation, tutorials and leaderboards.

Results

TaskDatasetMetricValueModel
Node ClassificationMuMiN-largeClaim Classification Macro-F10.598HeteroGraphSAGE
Node ClassificationMuMiN-largeTweet Classification Macro-F10.6145HeteroGraphSAGE
Node ClassificationMuMiN-largeClaim Classification Macro-F10.579LaBSE
Node ClassificationMuMiN-largeTweet Classification Macro-F10.528LaBSE
Node ClassificationMuMiN-largeClaim Classification Macro-F10.4813Majority class
Node ClassificationMuMiN-largeTweet Classification Macro-F10.4887Majority class
Node ClassificationMuMiN-largeClaim Classification Macro-F10.3879Random
Node ClassificationMuMiN-largeTweet Classification Macro-F10.369Random
Node ClassificationMuMiN-smallClaim Classification Macro-F10.6255LaBSE
Node ClassificationMuMiN-smallTweet Classification Macro-F10.545LaBSE
Node ClassificationMuMiN-smallClaim Classification Macro-F10.5795HeteroGraphSAGE
Node ClassificationMuMiN-smallTweet Classification Macro-F10.5605HeteroGraphSAGE
Node ClassificationMuMiN-smallClaim Classification Macro-F10.4756Majority class
Node ClassificationMuMiN-smallTweet Classification Macro-F10.4877Majority class
Node ClassificationMuMiN-smallClaim Classification Macro-F10.4007Random
Node ClassificationMuMiN-smallTweet Classification Macro-F10.3718Random
Node ClassificationMuMiN-mediumClaim Classification Macro-F10.577HeteroGraphSAGE
Node ClassificationMuMiN-mediumTweet Classification Macro-F10.541HeteroGraphSAGE
Node ClassificationMuMiN-mediumClaim Classification Macro-F10.5585LaBSE
Node ClassificationMuMiN-mediumTweet Classification Macro-F10.5745LaBSE
Node ClassificationMuMiN-mediumClaim Classification Macro-F10.4806Majority class
Node ClassificationMuMiN-mediumTweet Classification Macro-F10.4856Majority class
Node ClassificationMuMiN-mediumClaim Classification Macro-F10.3896Random
Node ClassificationMuMiN-mediumTweet Classification Macro-F10.3772Random

Related Papers

SHIELD: A Secure and Highly Enhanced Integrated Learning for Robust Deepfake Detection against Adversarial Attacks2025-07-17Leveraging Pre-Trained Visual Models for AI-Generated Video Detection2025-07-17KEN: Knowledge Augmentation and Emotion Guidance Network for Multimodal Fake News Detection2025-07-13LLM-Stackelberg Games: Conjectural Reasoning Equilibria and Their Applications to Spearphishing2025-07-12Multi-Agent Retrieval-Augmented Framework for Evidence-Based Counterspeech Against Health Misinformation2025-07-09LLMs are Introvert2025-07-08Remember Past, Anticipate Future: Learning Continual Multimodal Misinformation Detectors2025-07-08The Ethical Implications of AI in Creative Industries: A Focus on AI-Generated Art2025-07-08