TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

3,275 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

3,275 dataset results

CIRCO (Composed Image Retrieval on Common Objects in context)

CIRCO (Composed Image Retrieval on Common Objects in context) is an open-domain benchmarking dataset for Composed Image Retrieval (CIR) based on real-world images from COCO 2017 unlabeled set. It is the first CIR dataset with multiple ground truths and aims to address the problem of false negatives in existing datasets. CIRCO comprises a total of 1020 queries, randomly divided into 220 and 800 for the validation and test set, respectively, with an average of 4.53 ground truths per query.

35 papers8 benchmarksImages, Texts

ARO

Attribution, Relation, and Order (ARO) benchmark to systematically evaluate the ability of VLMs to understand different types of relationships, attributes, and order information. ARO consists of Visual Genome Attribution, to test the understanding of objects' properties; Visual Genome Relation, to test for relational understanding; and COCO-Order & Flickr30k-Order, to test for order sensitivity in VLMs. ARO is orders of magnitude larger than previous benchmarks of compositionality, with more than 50,000 test cases.

35 papers0 benchmarksImages, Texts

FinTabNet

This dataset contains complex tables from the annual reports of S&P 500 companies with detailed table structure annotations to help table structure recognition and table data extraction. The dataset consists of 89,646 pages comprising 112,887 tables with cell structure annotated from IBM Research.

35 papers0 benchmarksFinancial, Images, Tabular

ECSSD (Extended Complex Scene Saliency Dataset)

The Extended Complex Scene Saliency Dataset (ECSSD) is comprised of complex scenes, presenting textures and structures common to real-world images. ECSSD contains 1,000 intricate images and respective ground-truth saliency maps, created as an average of the labeling of five human participants.

34 papers52 benchmarksImages

SUIM (Segmentation of Underwater IMagery)

The Segmentation of Underwater IMagery (SUIM) dataset contains over 1500 images with pixel annotations for eight object categories: fish (vertebrates), reefs (invertebrates), aquatic plants, wrecks/ruins, human divers, robots, and sea-floor. The images have been rigorously collected during oceanic explorations and human-robot collaborative experiments, and annotated by human participants.

34 papers8 benchmarksImages

ACRONYM

A dataset for robot grasp planning based on physics simulation. The dataset contains 17.7M parallel-jaw grasps, spanning 8872 objects from 262 different categories, each labeled with the grasp result obtained from a physics simulator.

34 papers0 benchmarksImages

EgoHands

The EgoHands dataset contains 48 Google Glass videos of complex, first-person interactions between two people. The main intention of this dataset is to enable better, data-driven approaches to understanding hands in first-person computer vision. The dataset offers

34 papers0 benchmarksImages, Videos

UMDFaces

UMDFaces is a face dataset divided into two parts:

34 papers0 benchmarksImages

PolyU

PolyU Dataset is a large dataset of real-world noisy images with reasonably obtained corresponding “ground truth” images. The basic idea is to capture the same and unchanged scene for many (e.g., 500) times and compute their mean image, which can be roughly taken as the “ground truth” image for the real-world noisy images. The rational of this strategy is that for each pixel, the noise is generated randomly larger or smaller than 0. Sampling the same pixel many times and computing the average value will approximate the truth pixel value and alleviate significantly the noise.

34 papers6 benchmarksImages

IBims-1 (Independent benchmark images and matched scans v1)

iBims-1 (independent Benchmark images and matched scans - version 1) is a new high-quality RGB-D dataset, especially designed for testing single-image depth estimation (SIDE) methods. A customized acquisition setup, composed of a digital single-lens reflex (DSLR) camera and a high-precision laser scanner was used to acquire high-resolution images and highly accurate depth maps of diverse indoors scenarios.

34 papers14 benchmarksImages

CLEAR

CLEAR is a continual image classification benchmark dataset with a natural temporal evolution of visual concepts in the real world that spans a decade (2004-2014). CLEAR is built from existing large-scale image collections (YFCC100M) through a novel and scalable low-cost approach to visio-linguistic dataset curation. The pipeline makes use of pretrained vision language models (e.g. CLIP) to interactively build labeled datasets, which are further validated with crowd-sourcing to remove errors and even inappropriate images (hidden in original YFCC100M). The major strength of CLEAR over prior CL benchmarks is the smooth temporal evolution of visual concepts with real-world imagery, including both high-quality labeled data along with abundant unlabeled samples per time period for continual semi-supervised learning.

34 papers0 benchmarksImages

CRACK500

For the details of the work, the readers are refer to the paper "Feature Pyramid and Hierarchical Boosting Network for Pavement Crack Detection" (FPHB), T-ITS 2019. You can find the paper in https://www.researchgate.net/publication/330244656_Feature_Pyramid_and_Hierarchical_Boosting_Network_for_Pavement_Crack_Detection or https://arxiv.org/abs/1901.06340.

34 papers0 benchmarksImages

Deblur-NeRF

This dataset focus on two blur types: camera motion blur and defocus blur. For each type of blur we synthesize $5$ scenes using Blender. We manually place multi-view cameras to mimic real data capture. To render images with camera motion blur, we randomly perturb the camera pose, and then linearly interpolate poses between the original and perturbed poses for each view. We render images from interpolated poses and blend them in linear RGB space to generate the final blurry images. For defocus blur, we use the built-in functionality to render depth-of-field images. We fix the aperture and randomly choose a focus plane between the nearest and furthest depth.

34 papers0 benchmarksImages

CIHP (Crowd Instance-level Human Parsing)

The Crowd Instance-level Human Parsing (CIHP) dataset has 38,280 diverse human images. Each image in CIHP is labeled with pixel-wise annotations on 20 categories and instance-level identification. The dataset can be used for the human part segmentation task.

33 papers2 benchmarksImages

WMCA (Wide Multi Channel Presentation Attack)

The Wide Multi Channel Presentation Attack (WMCA) database consists of 1941 short video recordings of both bonafide and presentation attacks from 72 different identities. The data is recorded from several channels including color, depth, infra-red, and thermal.

33 papers2 benchmarksImages, RGB-D, Videos

CoSal2015

Cosal2015 is a large-scale dataset for co-saliency detection which consists of 2,015 images of 50 categories, and each group suffers from various challenging factors such as complex environments, occlusion issues, target appearance variations and background clutters, etc. All these increase the difficulty for accurate co-saliency detection.

33 papers42 benchmarksImages

3D Shapes Dataset

3dshapes is a dataset of 3D shapes procedurally generated from 6 ground truth independent latent factors. These factors are floor colour, wall colour, object colour, scale, shape and orientation.

33 papers0 benchmarksImages

SEVIR (Storm EVent ImagRy)

SEVIR is an annotated, curated and spatio-temporally aligned dataset containing over 10,000 weather events that each consist of 384 km x 384 km image sequences spanning 4 hours of time. Images in SEVIR were sampled and aligned across five different data types: three channels (C02, C09, C13) from the GOES-16 advanced baseline imager, NEXRAD vertically integrated liquid mosaics, and GOES-16 Geostationary Lightning Mapper (GLM) flashes. Many events in SEVIR were selected and matched to the NOAA Storm Events database so that additional descriptive information such as storm impacts and storm descriptions can be linked to the rich imagery provided by the sensors.

33 papers4 benchmarksImages

ClevrTex

ClevrTex is a new benchmark designed as the next challenge to compare, evaluate and analyze algorithms for unsupervised multi-object segmentation. ClevrTex features synthetic scenes with diverse shapes, textures and photo-mapped materials, created using physically based rendering techniques.

33 papers4 benchmarksImages

nuScenes-C

🤖 Robo3D - The nuScenes-C Benchmark nuScenes-C is an evaluation benchmark heading toward robust and reliable 3D perception in autonomous driving. With it, we probe the robustness of 3D detectors and segmentors under out-of-distribution (OoD) scenarios against corruptions that occur in the real-world environment. Specifically, we consider natural corruptions happen in the following cases:

33 papers9 benchmarksImages, Point cloud
PreviousPage 29 of 164Next