TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets/VALSE

VALSE

VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena

ImagesTextsIntroduced 2021-12-14

We propose VALSE (Vision And Language Structured Evaluation), a novel benchmark designed for testing general-purpose pretrained vision and language (V&L) models for their visio-linguistic grounding capabilities on specific linguistic phenomena. VALSE offers a suite of six tests covering various linguistic constructs. Solving these requires models to ground linguistic phenomena in the visual modality, allowing more fine-grained evaluations than hitherto possible. We expect VALSE to serve as an important benchmark to measure future progress of pretrained V&L models from a linguistic perspective, complementing the canonical task-centred V&L evaluations.

Benchmarks

Multimodal Deep Learning/average pairwise accuracyMultimodal Deep Learning/Average AccuracyMultimodal Text and Image Classification/average pairwise accuracyMultimodal Text and Image Classification/Average Accuracy

Related Benchmarks

VALSE actant swap/Multimodal Deep Learning/Accuracy (%)VALSE actant swap/Multimodal Deep Learning/pairwise accuracyVALSE actant swap/Multimodal Text and Image Classification/Accuracy (%)VALSE actant swap/Multimodal Text and Image Classification/pairwise accuracyVALSE action replacement/Multimodal Deep Learning/Accuracy (%)VALSE action replacement/Multimodal Deep Learning/pairwise accuracyVALSE action replacement/Multimodal Text and Image Classification/Accuracy (%)VALSE action replacement/Multimodal Text and Image Classification/pairwise accuracyVALSE coreference clean/Multimodal Deep Learning/Accuracy (%)VALSE coreference clean/Multimodal Deep Learning/pairwise accuracyVALSE coreference clean/Multimodal Text and Image Classification/Accuracy (%)VALSE coreference clean/Multimodal Text and Image Classification/pairwise accuracyVALSE coreference standard/Multimodal Deep Learning/Accuracy (%)VALSE coreference standard/Multimodal Deep Learning/pairwise accuracyVALSE coreference standard/Multimodal Text and Image Classification/Accuracy (%)VALSE coreference standard/Multimodal Text and Image Classification/pairwise accuracyVALSE counting adversarial/Multimodal Deep Learning/Accuracy (%)VALSE counting adversarial/Multimodal Deep Learning/pairwise accuracyVALSE counting adversarial/Multimodal Text and Image Classification/Accuracy (%)VALSE counting adversarial/Multimodal Text and Image Classification/pairwise accuracyVALSE counting balanced/Multimodal Deep Learning/Accuracy (%)VALSE counting balanced/Multimodal Deep Learning/pairwise accuracyVALSE counting balanced/Multimodal Text and Image Classification/Accuracy (%)VALSE counting balanced/Multimodal Text and Image Classification/pairwise accuracyVALSE counting small numbers/Multimodal Deep Learning/Accuracy (%)VALSE counting small numbers/Multimodal Deep Learning/pairwise accuracyVALSE counting small numbers/Multimodal Text and Image Classification/Accuracy (%)VALSE counting small numbers/Multimodal Text and Image Classification/pairwise accuracyVALSE existence/Multimodal Deep Learning/Accuracy (%)VALSE existence/Multimodal Deep Learning/pairwise accuracyVALSE existence/Multimodal Text and Image Classification/Accuracy (%)VALSE existence/Multimodal Text and Image Classification/pairwise accuracyVALSE foil-it (noun phrases)/Multimodal Deep Learning/Accuracy (%)VALSE foil-it (noun phrases)/Multimodal Deep Learning/pairwise accuracyVALSE foil-it (noun phrases)/Multimodal Text and Image Classification/Accuracy (%)VALSE foil-it (noun phrases)/Multimodal Text and Image Classification/pairwise accuracyVALSE plurality/Multimodal Deep Learning/Accuracy (%)VALSE plurality/Multimodal Deep Learning/pairwise accuracyVALSE plurality/Multimodal Text and Image Classification/Accuracy (%)VALSE plurality/Multimodal Text and Image Classification/pairwise accuracyVALSE spatial relations/Multimodal Deep Learning/Accuracy (%)VALSE spatial relations/Multimodal Deep Learning/pairwise accuracyVALSE spatial relations/Multimodal Text and Image Classification/Accuracy (%)VALSE spatial relations/Multimodal Text and Image Classification/pairwise accuracy

Statistics

Papers
26
Benchmarks
4

Links

Homepage

Tasks

Multimodal Deep LearningMultimodal Text and Image Classificationimage-sentence alignment