TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

3,275 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

3,275 dataset results

Color FERET

The color FERET database is a dataset for face recognition. It contains 11,338 color images of size 512×768 pixels captured in a semi-controlled environment with 13 different poses from 994 subjects.

36 papers19 benchmarksImages

BraTS 2013

BRATS 2013 is a brain tumor segmentation dataset consists of synthetic and real images, where each of them is further divided into high-grade gliomas (HG) and low-grade gliomas (LG). There are 25 patients with both synthetic HG and LG images and 20 patients with real HG and 10 patients with real LG images. For each patient, FLAIR, T1, T2, and post-Gadolinium T1 magnetic resonance (MR) image sequences are available.

36 papers0 benchmarksImages, MRI, Medical

DRealSR (Diverse Real-world image Super-Resolution)

DRealSR establishes a Super Resolution (SR) benchmark with diverse real-world degradation processes, mitigating the limitations of conventional simulated image degradation.

36 papers9 benchmarksImages

Fashion-Gen

Fashion-Gen consists of 293,008 high definition (1360 x 1360 pixels) fashion images paired with item descriptions provided by professional stylists. Each item is photographed from a variety of angles.

36 papers0 benchmarksImages, Texts

SciTSR

SciTSR is a large-scale table structure recognition dataset, which contains 15,000 tables in PDF format and their corresponding structure labels obtained from LaTeX source files.

36 papers0 benchmarksImages

VisualMRC (VisualMRC: Machine Reading Comprehension on Document Images)

VisualMRC is a visual machine reading comprehension dataset that proposes a task: given a question and a document image, a model produces an abstractive answer.

36 papers2 benchmarksImages, Texts

HO-3D v2

A hand-object interaction dataset with 3D pose annotations of hand and object. The dataset contains 66,034 training images and 11,524 test images from a total of 68 sequences. The sequences are captured in multi-camera and single-camera setups and contain 10 different subjects manipulating 10 different objects from YCB dataset. The annotations are automatically obtained using an optimization algorithm. The hand pose annotations for the test set are withheld and the accuracy of the algorithms on the test set can be evaluated with standard metrics using the CodaLab challenge submission(see project page). The object pose annotations for the test and train set are provided along with the dataset.

36 papers71 benchmarksImages, RGB-D

STPLS3D

Our project (STPLS3D) aims to provide a large-scale aerial photogrammetry dataset with synthetic and real annotated 3D point clouds for semantic and instance segmentation tasks.

36 papers10 benchmarks3D, Images, Point cloud

LSUI (Large Scale Underwater Image Dataset)

We released a large-scale underwater image (LSUI) dataset including 5004 image pairs, which involve richer underwater scenes (lighting conditions, water types and target categories) and better visual quality reference images than the existing ones.

36 papers2 benchmarksImages

V2XSet

A large-scale V2X perception dataset using CARLA and OpenCDA

36 papers24 benchmarks3D, Images, LiDAR

TEACh (Task-driven Embodied Agents that Chat)

Robots operating in human spaces must be able to engage in natural language interaction with people, both understanding and executing instructions, and using conversation to resolve ambiguity and recover from mistakes. To study this, we introduce TEACh, a dataset of over 3,000 human--human, interactive dialogues to complete household tasks in simulation. A Commander with access to oracle information about a task communicates in natural language with a Follower. The Follower navigates through and interacts with the environment to complete tasks varying in complexity from "Make Coffee" to "Prepare Breakfast", asking questions and getting additional information from the Commander. We propose three benchmarks using TEACh to study embodied intelligence challenges, and we evaluate initial models' abilities in dialogue understanding, language grounding, and task execution.

36 papers0 benchmarksDialog, Environment, Images

InfoSeek (Visual Information Seeking)

In this project, we introduce InfoSeek, a visual question answering dataset tailored for information-seeking questions that cannot be answered with only common sense knowledge. Using InfoSeek, we analyze various pre-trained visual question answering models and gain insights into their characteristics. Our findings reveal that state-of-the-art pre-trained multi-modal models (e.g., PaLI-X, BLIP2, etc.) face challenges in answering visual information-seeking questions, but fine-tuning on the InfoSeek dataset elicits models to use fine-grained knowledge that was learned during their pre-training.

36 papers2 benchmarksImages, Texts

mebeblurf

Matanga Darknet — 2025 Access Guide

35 papers25 benchmarksImages

BirdSong

The BirdSong dataset consists of audio recordings of bird songs at the H. J. Andrews (HJA) Experimental Forest, using unattended microphones. The goal of the dataset is to provide data to automatically identify the species of bird responsible for each utterance in these recordings. The dataset contains 548 10-seconds audio recordings.

35 papers0 benchmarksImages, Videos

COVID-19 Image Data Collection

Contains hundreds of frontal view X-rays and is the largest public resource for COVID-19 image and prognostic data, making it a necessary resource to develop and evaluate tools to aid in the treatment of COVID-19.

35 papers1 benchmarksBiology, Biomedical, Images, Medical

Google Landmarks Dataset v2

This is the second version of the Google Landmarks dataset (GLDv2), which contains images annotated with labels representing human-made and natural landmarks. The dataset can be used for landmark recognition and retrieval experiments. This version of the dataset contains approximately 5 million images, split into 3 sets of images: train, index and test

35 papers0 benchmarksImages

MuCo-3DHP

MuCo-3DHP is a large scale training data set showing real images of sophisticated multi-person interactions and occlusions.

35 papers0 benchmarksImages

PST900

PST900 is a dataset of 894 synchronized and calibrated RGB and Thermal image pairs with per pixel human annotations across four distinct classes from the DARPA Subterranean Challenge.

35 papers4 benchmarksImages

Spot-the-diff

Spot-the-diff is a dataset consisting of 13,192 image pairs along with corresponding human provided text annotations stating the differences between the two images.

35 papers0 benchmarksImages, Texts

SVT (Street View Text Dataset)

The Street View Text (SVT) dataset was harvested from Google Street View. Image text in this data exhibits high variability and often has low resolution. In dealing with outdoor street level imagery, we note two characteristics. (1) Image text often comes from business signage and (2) business names are easily available through geographic business searches. These factors make the SVT set uniquely suited for word spotting in the wild: given a street view image, the goal is to identify words from nearby businesses.

35 papers3 benchmarksImages
PreviousPage 28 of 164Next