TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Supervised Multimodal Bitransformers for Classifying Image...

Supervised Multimodal Bitransformers for Classifying Images and Text

Douwe Kiela, Suvrat Bhooshan, Hamed Firooz, Ethan Perez, Davide Testuggine

2019-09-06Natural Language InferenceGeneral Classification
PaperPDFCodeCode(official)CodeCodeCodeCode

Abstract

Self-supervised bidirectional transformer models such as BERT have led to dramatic improvements in a wide variety of textual classification tasks. The modern digital world is increasingly multimodal, however, and textual information is often accompanied by other modalities such as images. We introduce a supervised multimodal bitransformer model that fuses information from text and image encoders, and obtain state-of-the-art performance on various multimodal classification benchmark tasks, outperforming strong baselines, including on hard test sets specifically designed to measure multimodal performance.

Results

TaskDatasetMetricValueModel
Natural Language InferenceV-SNLIAccuracy90.5MMBT

Related Papers

LRCTI: A Large Language Model-Based Framework for Multi-Step Evidence Retrieval and Reasoning in Cyber Threat Intelligence Credibility Verification2025-07-15DS@GT at CheckThat! 2025: Evaluating Context and Tokenization Strategies for Numerical Fact Verification2025-07-08ARAG: Agentic Retrieval Augmented Generation for Personalized Recommendation2025-06-27Thunder-NUBench: A Benchmark for LLMs' Sentence-Level Negation Understanding2025-06-17When Does Meaning Backfire? Investigating the Role of AMRs in NLI2025-06-17Explainable Compliance Detection with Multi-Hop Natural Language Inference on Assurance Case Structure2025-06-10Theorem-of-Thought: A Multi-Agent Framework for Abductive, Deductive, and Inductive Reasoning in Language Models2025-06-08A MISMATCHED Benchmark for Scientific Natural Language Inference2025-06-05