TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

3,275 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

3,275 dataset results

ISP-AD (The Industrial Screen Printing Anomaly Detection Dataset)

The ISP-AD Dataset is a large-scale anomaly detection dataset, representing a real-world industrial use case. It contains 312,674 fault-free and 246,375 defective samples, including 245,664 synthetic defects and 711 real defects collected on the factory floor.

1 papers0 benchmarksImages

MARIO (Monitoring Age-related Macular Degeneration Progression In Optical Coherence Tomography)

MICCAI Challenge 2024

1 papers0 benchmarksImages

HDRT (HDRT Dataset)

The HDRT dataset is a large-scale dataset designed for infrared-guided high dynamic range (HDR) imaging. It includes aligned infrared (IR), standard dynamic range (SDR), and HDR images to facilitate research in multi-modal fusion, HDR imaging, and related areas.

1 papers0 benchmarksImages

AerialMPT

AerialMPT is a dataset for pedestrian tracking in aerial image sequences and presents real-world challenges for MOT algorithms such as low frame rate, small moving objects, and complex backgrounds. AerialMPT consists of 14 sequences and 307 frames with an average size of 425 × 358 pixels. The images were acquired by DLR's 4K camera system from altitudes ranging from 600 m to 1400 m, resulting in spatial resolutions (GSDs) ranging from 8 cm/pixel to 13 cm/pixel. In a post-processing step, the images were co-registered, geo-referenced, and cropped for each region of interest, resulting in sequences of 2 fps. The images were acquired during different flight campaigns between 2016 and 2017, over different scenes containing pedestrians and with different crowd densities and movement complexities.

1 papers0 benchmarksImages, Videos

Pick-a-Filter

Pick-a-Filter is a semi-synthetic dataset constructed from Pick-a-Pic v1 to measure the capability of text-to-image models of adapting to heterogeneous preferences. We assign users from V1 randomly into two groups: those who prefer blue, cooler image tones (G1) and those who prefer red, warmer image tones (G2). After constructing this split, we apply the following logic to construct the dataset:

1 papers0 benchmarksImages

Banapple

The dataset consists of images of bananas and apples. It was created by collecting images, under the Creative Commons license, from Flickr. The images illustrate bananas and apples with variations regarding the color, placement, size, and background. The motivation for the construction of this dataset stems from studies in cognitive science, where human perception is investigated using examples with discrete properties of bananas and apples. It can be used in the context of explainable/interpretable image classification as in: Dimas, G., Cholopoulou, E., & Iakovidis, D. K. (2023). E pluribus unum interpretable convolutional neural networks. Scientific Reports, 13(1), 11421. https://www.nature.com/articles/s41598-023-38459-1

1 papers0 benchmarksImages

VETRA

VETRA is a dataset for vehicle tracking in aerial image sequences and presents unique challenges such as low frame rates, small and fast-moving objects, as well as high camera movement. These characteristics allow for extended tracking of numerous vehicles with varying motion behaviors over large areas and pose new challenges for MOT algorithms. VETRA consists of 52 image sequences captured by airplanes and helicopters using DLR’s 3k and 4k camera systems. The acquisition sites are located in Germany and Austria. In addition to the classical training, validation and test sets, VETRA offers a second test set specifically designed for the application of large area monitoring (LAM). The LAM sequences are recorded over 7 rural roads and motorways with a fixed camera speed and configuration. Each road section is captured at 4 different times of the day, enabling the performance of MOT algorithms to be evaluated under different traffic loads in a static environment. Furthermore, the feature

1 papers0 benchmarksImages, RGB Video, Videos

LIRCAD (Inria Liver vessels subbranch anotomical nomenclature labels - "LIRCAD")

The structure for the dataset is as follows :

1 papers0 benchmarksImages, Medical

MeshFLeet (eshFleet: Filtered and Annotated 3D Vehicle Dataset for Domain Specific Generative Modeling Resources)

MeshFleet is a filtered and annotated dataset of High Quality vehicles derived from Objaverse XL. It contains the sha256 of the objects together with consitent object captions and vehicle parameters.

1 papers0 benchmarks3D, Images

Songdo Traffic (Songdo Traffic: High Accuracy Georeferenced Vehicle Trajectories from a Large-Scale Study in a Smart City)

The Songdo Traffic dataset delivers precisely georeferenced vehicle trajectories captured through high-altitude bird's-eye view (BeV) drone footage over Songdo International Business District, South Korea. Comprising approximately 700,000 unique trajectories, this resource represents one of the most extensive aerial traffic datasets publicly available, distinguishing itself through exceptional temporal resolution that captures vehicle movements at 29.97 points per second, enabling unprecedented granularity for advanced urban mobility analysis.

1 papers0 benchmarksImages, Tabular, Time series, Tracking, Videos

Songdo Vision (Songdo Vision: Vehicle Annotations from High-Altitude BeV Drone Imagery in a Smart City)

The Songdo Vision dataset provides high-resolution (4K, 3840×2160 pixels) RGB images annotated with categorized axis-aligned bounding boxes (BBs) for vehicle detection from a high-altitude bird’s-eye view (BeV) perspective. Captured over Songdo International Business District, South Korea, this dataset consists of 5,419 annotated video frames, featuring approximately 300,000 vehicle instances categorized into four classes:

1 papers20 benchmarksImages, Tabular

BIRDeep (BIRDeep_AudioAnnotations)

The BIRDeep Audio Annotations dataset is a collection of bird vocalizations from Doñana National Park, Spain. It was created as part of the BIRDeep project, which aims to optimize the detection and classification of bird species in audio recordings using deep learning techniques. The dataset is intended for use in training and evaluating models for bird vocalization detection and identification.

1 papers0 benchmarksAudio, Biology, Environment, Images

MIKASA-Robo Dataset

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 papers0 benchmarksActions, Images, Replay data

NCSE v2.0 (NCSE v2.0: A Dataset of OCR-Processed 19th Century English Newspapers)

The NCSE v2.0 is a digitized collection of six 19th-century English periodicals

1 papers0 benchmarksImages, Texts

BLN600 (BLN600: A Parallel Corpus of Machine/Human Transcribed Nineteenth Century Newspaper Texts)

A publicly available corpus of nineteenth-century newspaper text focused on crime in London, derived from the Gale British Library Newspapers corpus parts 1 and 2. The corpus comprises 600 newspaper excerpts and for each excerpt contains the original source image, the machine transcription of that image as found in the BLN and a gold standard manual transcription.

1 papers0 benchmarksImages, Texts

GroundCap

GroundCap is a novel grounded image captioning dataset derived from MovieNet, containing 52,350 movie frames with detailed grounded captions. The dataset uniquely features an ID-based system that maintains object identity throughout captions, enables tracking of object interactions, and grounds not only objects but also actions and locations in the scene.

1 papers0 benchmarksImages, Texts

Spiideo SoccerNet SynLoc

Synthetic soccer players rendered on top of real world stadium images in 4K covering half a pitch each. Ground truth annotations in form of precise location of players on the pitch as well as 3D location of player pelvis and image bounding boxes.

1 papers18 benchmarks3D, Images

LSDBench (Long-video Sampling Dilemma Benchmark)

A benchmark that focuses on the sampling dilemma in long-video tasks. The LSDBench dataset is designed to evaluate the sampling efficiency of long-video VLMs. It consists of multiple-choice question-answer pairs based on hour-long videos, focusing on dense and short-duration actions with high Necessary Sampling Density (NSD).

1 papers0 benchmarksActions, Images, Texts, Videos

AWMM-100k

The existing multi-modality image fusion dataset lacks comprehensive coverage of adverse weather scenarios. To address this, we introduce AWMM-100k, a benchmark dataset constructed by selecting samples from RoadScene, MSRS, M3FD, and LLVIP, followed by controlled degradation processing to simulate adverse weather conditions. Combined with real-world data captured using a DJI M30T drone equipped with high-resolution visible and thermal cameras, AWMM-100k comprises 187,699 images covering rain, haze, and snow, each categorized into heavy, medium, and light intensities. This dataset supports research on multi-modality image fusion under challenging weather conditions and is also applicable to image restoration tasks such as dehazing, deraining, and desnowing. We thank the original dataset for its contribution. In addition, we believe this dataset significantly expands the scope of multimodal image processing and computer vision research, facilitating advancements in both image fusion and

1 papers0 benchmarksImages

GeoJEPAD (GeoJEPA Dataset)

GeoJEPAD is a multimodal dataset combining OpenStreetMap (OSM) data (attributes and geometries) with high-resolution aerial imagery from diverse urban areas.

 Sourced from NAIP and OSM and then processed, tiled, and cropped. Geometries and relations represented as graphs with optional visibility edges.

1 papers0 benchmarksGraphs, Images, Texts
PreviousPage 147 of 164Next