TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

IU X-Ray

IU X-ray (Demner-Fushman et al., 2016) is a set of chest X-ray images paired with their corresponding diagnostic reports. The dataset contains 7,470 pairs of images and reports. A simplified version of IU X-Ray through CPIR-MR prompting was presented at AAAI 2025: AAAI Article Link. Dataset Link: SMR IU X-Ray.

19 papers8 benchmarks

DDXPlus (DDXPlus: A New Dataset For Automatic Medical Diagnosis)

There has been a rapidly growing interest in Automatic Symptom Detection (ASD) and Automatic Diagnosis (AD) systems in the machine learning research literature, aiming to assist doctors in telemedicine services. These systems are designed to interact with patients, collect evidence about their symptoms and relevant antecedents, and possibly make predictions about the underlying diseases. Doctors would review the interactions, including the evidence and the predictions, collect if necessary additional information from patients, before deciding on next steps. Despite recent progress in this area, an important piece of doctors' interactions with patients is missing in the design of these systems, namely the differential diagnosis. Its absence is largely due to the lack of datasets that include such information for models to train on. In this work, we present a large-scale synthetic dataset of roughly 1.3 million patients that includes a differential diagnosis, along with the ground truth

19 papers0 benchmarksMedical, Texts

OVEN (Open-domain Visual Entity Recognition)

In this project, we formally present the task of Open-domain Visual Entity recognitioN (OVEN), where a model need to link an image onto a Wikipedia entity with respect to a text query. We construct OVEN-Wiki by re-purposing 14 existing datasets with all labels grounded onto one single label space: Wikipedia entities. OVEN challenges models to select among six million possible Wikipedia entities, making it a general visual recognition benchmark with the largest number of labels.

19 papers1 benchmarksImages, Texts

WanJuan

WanJuan is a large-scale training corpus that includes multiple modalities. The dataset incorporates text, image-text, and video modalities, with a total volume exceeding 2TB.

19 papers0 benchmarksImages, Texts, Videos

ImageNet-1k vs SUN

A benchmark dataset for out-of-distribution detection. ImageNet-1k is in-distribution, while SUN is out-of-distribution.

19 papers2 benchmarksImages

InfiMM-Eval (Complex Open-ended Reasoning Evaluation for Multi-Modal Language Models)

Multi-modal Large Language Models (MLLMs) are increasingly prominent in the field of artificial intelligence. Although many benchmarks attempt to holistically evaluate MLLMs, they typically concentrate on basic reasoning tasks, often yielding only simple yes/no or multi-choice responses. These methods naturally lead to confusion and difficulties in conclusively determining the reasoning capabilities of MLLMs. To mitigate this issue, we manually curate CORE-MM benchmark dataset, specifically designed for MLLMs with a focus on complex reasoning tasks. Our benchmark comprises three key reasoning categories: deductive, abductive, and analogical reasoning. The queries in our dataset are intentionally constructed to engage the reasoning capabilities of MLLMs in the process of generating answers. For a fair comparison across various MLLMs, we incorporate intermediate reasoning steps into our evaluation criteria. CORE-MM benchmark consists of 279 manually curated reasoning questions, associate

19 papers5 benchmarksImages

KinFaceW-I

KinFaceW-I dataset contains 533 pairs of facial images of persons with a kin relation. Four different kin relations are considered in the dataset: father and daughter (F-D) with 134 pairs, father and son (F-S) with 156 pairs, mother and daughter (M-D) with 127 pairs, mother and son (M-S) with 116 pairs. Each sample is composed of one parent face image and one child face image.

19 papers1 benchmarks

CValues

CValues is a Chinese human values evaluation benchmark designed to assess the alignment of Chinese Large Language Models (LLMs) with human values. Let me provide you with more details:

19 papers0 benchmarks

TSS

dataset of 400 image pairs

19 papers1 benchmarks

Amazon Clothing (Amazon Clothing 5-core)

Amazon Clothing (Amazon Clothing 5-core)

19 papers1 benchmarks

2010 i2b2/VA

2010 i2b2/VA is a biomedical dataset for relation classification and entity typing.

18 papers7 benchmarksTexts

TempEval-3 (TempEval-3: events, times, and temporal relations)

Within the SemEval-2013 evaluation exercise, the TempEval-3 shared task aims to advance research on temporal information processing. It follows on from TempEval-1 and -2, with: a three-part structure covering temporal expression, event, and temporal relation extraction; a larger dataset; and new single measures to rank systems – in each task and in general.

18 papers27 benchmarksTexts

CASIA-B

CASIA-B is a large multiview gait database, which is created in January 2005. There are 124 subjects, and the gait data was captured from 11 views. Three variations, namely view angle, clothing and carrying condition changes, are separately considered. Besides the video files, we still provide human silhouettes extracted from video files. The detailed information about Dataset B and an evaluation framework can be found in this paper .

18 papers5 benchmarksImages, Videos

Yeast

Yeast dataset consists of a protein-protein interaction network. Interaction detection methods have led to the discovery of thousands of interactions between proteins, and discerning relevance within large-scale data sets is important to present-day biology.

18 papers0 benchmarksBiology, Graphs

Amazon-Book

N/A

18 papers8 benchmarks

Amazon Beauty (Amazon Beauty 5-core)

This dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs).

18 papers8 benchmarksImages, Texts

Kennedy Space Center

Kennedy Space Center is a dataset for the classification of wetland vegetation at the Kennedy Space Center, Florida using hyperspectral imagery. Hyperspectral data were acquired over KSC on March 23, 1996 using JPL's Airborne Visible/Infrared Imaging Spectrometer.

18 papers18 benchmarksHyperspectral images, Images

Florence3D

The dataset collected at the University of Florence during 2012, has been captured using a Kinect camera. It includes 9 activities: wave, drink from a bottle, answer phone,clap, tight lace, sit down, stand up, read watch, bow. During acquisition, 10 subjects were asked to perform the above actions for 2/3 times. This resulted in a total of 215 activity samples.

18 papers0 benchmarks3D, Images

ICDAR 2017

ICDAR2017 is a dataset for scene text detection.

18 papers0 benchmarksImages, Texts

Replay-Mobile

The Replay-Mobile Database for face spoofing consists of 1190 video clips of photo and video attack attempts to 40 clients, under different lighting conditions. These videos were recorded with current devices from the market -- an iPad Mini2 (running iOS) and a LG-G4 smartphone (running Android). This Database was produced at the Idiap Research Institute (Switzerland) within the framework of collaboration with Galician Research and Development Center in Advanced Telecommunications - Gradiant (Spain).

18 papers0 benchmarksImages, Videos
PreviousPage 108 of 1000Next