Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

VALSE

VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena

ImagesTextsIntroduced 2021-12-14

We propose VALSE (Vision And Language Structured Evaluation), a novel benchmark designed for testing general-purpose pretrained vision and language (V&L) models for their visio-linguistic grounding capabilities on specific linguistic phenomena. VALSE offers a suite of six tests covering various linguistic constructs. Solving these requires models to ground linguistic phenomena in the visual modality, allowing more fine-grained evaluations than hitherto possible. We expect VALSE to serve as an important benchmark to measure future progress of pretrained V&L models from a linguistic perspective, complementing the canonical task-centred V&L evaluations.

Benchmarks

Multimodal Deep Learning/average pairwise accuracy Multimodal Deep Learning/Average Accuracy Multimodal Text and Image Classification/average pairwise accuracy Multimodal Text and Image Classification/Average Accuracy

Related Benchmarks

VALSE actant swap/Multimodal Deep Learning/Accuracy (%)VALSE actant swap/Multimodal Deep Learning/pairwise accuracy VALSE actant swap/Multimodal Text and Image Classification/Accuracy (%)VALSE actant swap/Multimodal Text and Image Classification/pairwise accuracy VALSE action replacement/Multimodal Deep Learning/Accuracy (%)VALSE action replacement/Multimodal Deep Learning/pairwise accuracy VALSE action replacement/Multimodal Text and Image Classification/Accuracy (%)VALSE action replacement/Multimodal Text and Image Classification/pairwise accuracy VALSE coreference clean/Multimodal Deep Learning/Accuracy (%)VALSE coreference clean/Multimodal Deep Learning/pairwise accuracy VALSE coreference clean/Multimodal Text and Image Classification/Accuracy (%)VALSE coreference clean/Multimodal Text and Image Classification/pairwise accuracy VALSE coreference standard/Multimodal Deep Learning/Accuracy (%)VALSE coreference standard/Multimodal Deep Learning/pairwise accuracy VALSE coreference standard/Multimodal Text and Image Classification/Accuracy (%)VALSE coreference standard/Multimodal Text and Image Classification/pairwise accuracy VALSE counting adversarial/Multimodal Deep Learning/Accuracy (%)VALSE counting adversarial/Multimodal Deep Learning/pairwise accuracy VALSE counting adversarial/Multimodal Text and Image Classification/Accuracy (%)VALSE counting adversarial/Multimodal Text and Image Classification/pairwise accuracy VALSE counting balanced/Multimodal Deep Learning/Accuracy (%)VALSE counting balanced/Multimodal Deep Learning/pairwise accuracy VALSE counting balanced/Multimodal Text and Image Classification/Accuracy (%)VALSE counting balanced/Multimodal Text and Image Classification/pairwise accuracy VALSE counting small numbers/Multimodal Deep Learning/Accuracy (%)VALSE counting small numbers/Multimodal Deep Learning/pairwise accuracy VALSE counting small numbers/Multimodal Text and Image Classification/Accuracy (%)VALSE counting small numbers/Multimodal Text and Image Classification/pairwise accuracy VALSE existence/Multimodal Deep Learning/Accuracy (%)VALSE existence/Multimodal Deep Learning/pairwise accuracy VALSE existence/Multimodal Text and Image Classification/Accuracy (%)VALSE existence/Multimodal Text and Image Classification/pairwise accuracy VALSE foil-it (noun phrases)/Multimodal Deep Learning/Accuracy (%)VALSE foil-it (noun phrases)/Multimodal Deep Learning/pairwise accuracy VALSE foil-it (noun phrases)/Multimodal Text and Image Classification/Accuracy (%)VALSE foil-it (noun phrases)/Multimodal Text and Image Classification/pairwise accuracy VALSE plurality/Multimodal Deep Learning/Accuracy (%)VALSE plurality/Multimodal Deep Learning/pairwise accuracy VALSE plurality/Multimodal Text and Image Classification/Accuracy (%)VALSE plurality/Multimodal Text and Image Classification/pairwise accuracy VALSE spatial relations/Multimodal Deep Learning/Accuracy (%)VALSE spatial relations/Multimodal Deep Learning/pairwise accuracy VALSE spatial relations/Multimodal Text and Image Classification/Accuracy (%)VALSE spatial relations/Multimodal Text and Image Classification/pairwise accuracy

Statistics

Papers: 26
Benchmarks: 4

Links

Tasks

Multimodal Deep Learning Multimodal Text and Image Classification image-sentence alignment