TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

3,275 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

3,275 dataset results

PuzzleVQA

Large multimodal models extend the impressive capabilities of large language models by integrating multimodal understanding abilities. However, it is not clear how they can emulate the general intelligence and reasoning ability of humans. As recognizing patterns and abstracting concepts are key to general intelligence, we introduce PuzzleVQA, a collection of puzzles based on abstract patterns. With this dataset, we evaluate large multimodal models with abstract patterns based on fundamental concepts, including colors, numbers, sizes, and shapes. Through our experiments on state-of-the-art large multimodal models, we find that they are not able to generalize well to simple abstract patterns. Notably, even GPT-4V cannot solve more than half of the puzzles. To diagnose the reasoning challenges in large multimodal models, we progressively guide the models with our ground truth reasoning explanations for visual perception, inductive reasoning, and deductive reasoning. Our systematic analysi

3 papers0 benchmarksImages, Texts

IIW-400 (ImageInWords: IIW-400)

Please refer: https://github.com/google/imageinwords/blob/main/datasets/IIW-400/README.md

3 papers0 benchmarksImages, Texts

Mono3DRefer

We sample 2025 frames of images from the original KITTI for Mono3DRefer, containing 41,140 expressions in total and a vocabulary of 5,271 words.

3 papers0 benchmarks3D, Images, Texts

MULTI

MULTI-Benchmark is a cutting-edge benchmark for evaluating Multimodal Large Language Models (MLLMs). It is designed to test the understanding of complex tables and images, and reasoning with long context¹. Here are some key features of MULTI-Benchmark:

3 papers0 benchmarksImages, Texts

MUTE (Multimodal Bengali Hateful Memes Dataset)

MUTE This is the first open-source Bengali Hateful Meme dataset, consisting of around 4200 memes annotated with two labels: hate and not hate.

3 papers0 benchmarksImages, Texts

AnoVox

AnoVox is a large-scale benchmark for ANOmaly detection in autonomous driving. AnoVox incorporates multimodal sensor data and spatial VOXel ground truth, allowing for the comparison of methods independent of their used sensor. AnoVox contains both content and temporal anomalies.

3 papers0 benchmarks3D, Images, LiDAR, RGB-D

SCapRepo (Google Play Screenshot Caption)

A screenshot-caption dataset containing 135k pairs of screenshots and captions extracted from Google Play.

3 papers0 benchmarksImages, Texts

WFDD (Woven Fabric Defect Detection)

WFDD is a dataset for benchmarking anomaly detection methods with a focus on textile inspection. It includes 4101 woven fabric images categorized into 4 categories: grey cloth, grid cloth, yellow cloth, and pink flower. The first three classes are collected from the industrial production sites of WEIQIAO Textile, while the 'pink flower' class is gathered from the publicly available Cloth Flaw Dataset. Each category contains block-shape, point-like, and line-type defects with pixel-level annotations.

3 papers3 benchmarksImages

UZLF (Leuven-Haifa High-Resolution Fundus Image Dataset for Retinal Blood Vessel Segmentation and Glaucoma Diagnosis)

The Leuven-Haifa dataset contains 240 disc-centered fundus images of 224 unique patients (75 patients with normal tension glaucoma, 63 patients with high tension glaucoma, 30 patients with other eye diseases and 56 healthy controls) from the University Hospitals of Leuven. The arterioles and venules of these images were both annotated by master students in medicine and corrected by a senior annotator. All senior segmentation corrections are provided as well as the junior segmentations of the test set. An open-source toolbox for the parametrization of segmentations was developed. Diagnosis, age, sex, vascular parameters as well as a quality score are provided as metadata. Potential reuse is envisioned as the development or external validation of blood vessels segmentation algorithms or study of the vasculature in glaucoma and the development of glaucoma diagnosis algorithms. The dataset is available on the KU Leuven Research Data Repository (RDR).

3 papers2 benchmarksImages

RLHF-V Dataset

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

3 papers0 benchmarksImages, Texts

SensumSODF (Sensum Solid Oral Dosage Forms)

Given the unavailability of real-world pharmaceutical inspection-domain datasets, we have created the Sensum Solid Oral Dosage Forms (SensumSODF) dataset intended for research and evaluation purposes.

3 papers0 benchmarksImages

Tiny ImageNetV2

Tiny ImageNetv2 is a subset of the ImageNetV2 (matched frequency) dataset by Recht et al. ("Do ImageNet Classifiers Generalize to ImageNet?") with 2,000 images spanning all 200 classes of the Tiny ImageNet dataset. It is a test set achieved by collecting images of joint classes of Tiny ImageNet and ImageNet. The resized images of size 64×64 consist of images collected from Flickr after a decade of progress on the original ImageNet dataset. The data collection process was designed to resemble the original ImageNet dataset distribution. For further information on ImageNetV2 visit the original GitHub repository of ImageNetV2.

3 papers0 benchmarksImages

Kvasir-VQA (A Text-Image Pair GI Tract Dataset)

The Kvasir-VQA dataset is an extended dataset derived from the HyperKvasir and Kvasir-Instrument datasets, augmented with question-and-answer annotations. This dataset is designed to facilitate advanced machine learning tasks in gastrointestinal (GI) diagnostics, including image captioning, Visual Question Answering (VQA) and text-based generation of synthetic medical images.

3 papers0 benchmarksImages, Medical, Tabular, Texts

P2GB

A benchmark designed to evaluate MLLMs’ proficiency in understanding inter-object relationships and textual content.

3 papers0 benchmarksImages, Texts

Amazon Digital Music (Amazon Digital Music 5-core)

This dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs).

3 papers3 benchmarksImages, Texts

FindingEmo

FindingEmo is an image dataset containing annotations for 25k images, specifically tailored to Emotion Recognition. Contrary to existing datasets, it focuses on complex scenes depicting multiple people in various naturalistic, social settings, with images being annotated as a whole, thereby going beyond the traditional focus on faces or single individuals. Annotated dimensions include Valence, Arousal and Emotion label, with annotations gathered using Prolific. Together with the annotations, we release the list of URLs pointing to the original images, as well as all associated source code.

3 papers0 benchmarksImages

EC-FUNSD

EC-FUNSD is introduced in [arXiv:2402.02379] as a benchmark of semantic entity recognition (SER) and entity linking (EL), designed for the entity-centric robustness evaluation of pre-trained text-and-layout models (PTLMs).

3 papers2 benchmarksImages, Texts

ROOR

ROOR is a reading order prediction (ROP) benchmark which annotates layout reading order as ordering relations.

3 papers1 benchmarksImages, Texts

OAM-TCD

OAM-TCD is a dataset of around 5k aerial images from around the world to support robust tree detection algorithms. Full details can be found on the linked HuggingFace repository.

3 papers0 benchmarksImages

SMILE-UHURA (Small Vessel Segmentation at Mesoscopic Scale from Ultra-High Resolution 7T Magnetic Resonance Angiogram)

The human brain receives nutrients and oxygen through an intricate network of blood vessels. Pathology affecting small vessels, at the mesoscopic scale, represents a critical vulnerability within the cerebral blood supply and can lead to severe conditions, such as Cerebral Small Vessel Diseases. The advent of 7 Tesla MRI systems has enabled the acquisition of higher spatial resolution images, making it possible to visualise such vessels in the brain. However, the lack of publicly available annotated datasets has impeded the development of robust, machine learning-driven segmentation algorithms. To address this, the SMILE-UHURA challenge was organised. This challenge, held in conjunction with the ISBI 2023, in Cartagena de Indias, Colombia, aimed to provide a platform for researchers working on related topics. The SMILE-UHURA challenge addresses the gap in publicly available annotated datasets by providing an annotated dataset of Time-of-Flight angiography acquired with 7T MRI. This dat

3 papers0 benchmarksImages, MRI, Medical
PreviousPage 90 of 164Next