TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

3,275 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

3,275 dataset results

MVSEC (Multi Vehicle Stereo Event Camera)

The Multi Vehicle Stereo Event Camera (MVSEC) dataset is a collection of data designed for the development of novel 3D perception algorithms for event based cameras. Stereo event data is collected from car, motorbike, hexacopter and handheld data, and fused with lidar, IMU, motion capture and GPS to provide ground truth pose and depth images.

28 papers5 benchmarksImages, LiDAR, Stereo

OASIS (Open Annotations of Single Image Surfaces)

A dataset for single-image 3D in the wild consisting of annotations of detailed 3D geometry for 140,000 images.

28 papers4 benchmarks3D, Images

SegTHOR (Segmentation of THoracic Organs at Risk)

SegTHOR (Segmentation of THoracic Organs at Risk) is a dataset dedicated to the segmentation of organs at risk (OARs) in the thorax, i.e. the organs surrounding the tumour that must be preserved from irradiations during radiotherapy. In this dataset, the OARs are the heart, the trachea, the aorta and the esophagus, which have varying spatial and appearance characteristics. The dataset includes 60 3D CT scans, divided into a training set of 40 and a test set of 20 patients, where the OARs have been contoured manually by an experienced radiotherapist.

28 papers0 benchmarksImages, Medical

TextZoom

TextZoom is a super-resolution dataset that consists of paired Low Resolution – High Resolution scene text images. The images are captured by cameras with different focal length in the wild.

28 papers18 benchmarksImages

MHIST (Minimalist Histopathology image analysis dataset)

The minimalist histopathology image analysis dataset (MHIST) is a binary classification dataset of 3,152 fixed-size images of colorectal polyps, each with a gold-standard label determined by the majority vote of seven board-certified gastrointestinal pathologists. MHIST also includes each image’s annotator agreement level. As a minimalist dataset, MHIST occupies less than 400 MB of disk space, and a ResNet-18 baseline can be trained to convergence on MHIST in just 6 minutes using approximately 3.5 GB of memory on a NVIDIA RTX 3090. As example use cases, the authors use MHIST to study natural questions that arise in histopathology image classification such as how dataset size, network depth, transfer learning, and high-disagreement examples affect model performance.

28 papers1 benchmarksBiology, Images

MIT-Adobe FiveK

The MIT-Adobe FiveK dataset consists of 5,000 photographs taken with SLR cameras by a set of different photographers. They are all in RAW format; that is, all the information recorded by the camera sensor is preserved. We made sure that these photographs cover a broad range of scenes, subjects, and lighting conditions. We then hired five photography students in an art school to adjust the tone of the photos. Each of them retouched all the 5,000 photos using a software dedicated to photo adjustment (Adobe Lightroom) on which they were extensively trained. We asked the retouchers to achieve visually pleasing renditions, akin to a postcard. The retouchers were compensated for their work.

28 papers4 benchmarksImages

KITTI MOTS (KITTI Multi-Object Tracking and Segmentation (MOTS) Evaluation)

The Multi-Object and Segmentation (MOTS) benchmark [2] consists of 21 training sequences and 29 test sequences. It is based on the KITTI Tracking Evaluation 2012 and extends the annotations to the Multi-Object and Segmentation (MOTS) task. To this end, we added dense pixel-wise segmentation labels for every object. We evaluate submitted results using the metrics HOTA, CLEAR MOT, and MT/PT/ML. We rank methods by HOTA [1]. Our development kit and GitHub evaluation code provide details about the data format as well as utility functions for reading and writing the label files. (adapted for the segmentation case). Evaluation is performed using the code from the TrackEval repository.

28 papers3 benchmarksImages, Tracking, Videos

COCO 10% labeled data

Semi-Supervised Object Detection on COCO 10% labeled data

28 papers5 benchmarksImages

MetaShift

MetaShift is a collection of 12,868 sets of natural images across 410 classes. It can be used to benchmark and evaluate how robust machine learning models are to data shifts.

28 papers0 benchmarksImages

PSG Dataset

PSG dataset has 48749 images with 133 object classes (80 objects and 53 stuff) and 56 predicate classes. It annotates inter-segment relations based on COCO panoptic segmentation.

28 papers6 benchmarksImages

THuman2.0 Dataset

THuman2.0 Dataset contains 500 high-quality human scans captured by a dense DLSR rig. For each scan, we provide the 3D model (.obj) and the corresponding texture map (.jpeg). Image Source: Original Paper

28 papers4 benchmarks3D, Images, RGB-D

XM 3600 (Crossmodal 3600)

Research in massively multilingual image captioning has been severely hampered by a lack of high-quality evaluation datasets. In this paper we present the Crossmodal-3600 dataset (XM3600 in short), a geographically-diverse set of 3600 images annotated with human-generated reference captions in 36 languages. The images were selected from across the world, covering regions where the 36 languages are spoken, and annotated with captions that achieve consistency in terms of style across all languages, while avoiding annotation artifacts due to direct translation. We apply this benchmark to model selection for massively multilingual image captioning models, and show strong correlation results with human evaluations when using XM3600 as golden references for automatic metrics.

28 papers0 benchmarksImages, Texts

NoW Benchmark

The goal of this benchmark is to introduce a standard evaluation metric to measure the accuracy and robustness of 3D face reconstruction methods under variations in viewing angle, lighting, and common occlusions.

27 papers15 benchmarks3d meshes, Images

Silhouettes (CalTech 101 Silhouettes)

The Caltech 101 Silhouettes dataset consists of 4,100 training samples, 2,264 validation samples and 2,307 test samples. The datast is based on CalTech 101 image annotations. Each image in the CalTech 101 data set includes a high-quality polygon outline of the primary object in the scene. To create the CalTech 101 Silhouettes data set, the authors center and scale each outline and render it on a DxD pixel image-plane. The outline is rendered as a filled, black polygon on a white background. Many object classes exhibit silhouettes that have distinctive class-specific features. A relatively small number of classes like soccer ball, pizza, stop sign, and yin-yang are indistinguishable based on shape, but have been left-in in the data.

27 papers0 benchmarksImages

ISIC 2018 Task 1

The ISIC 2018 dataset was published by the International Skin Imaging Collaboration (ISIC) as a large-scale dataset of dermoscopy images. This Task 1 dataset is the challenge on lesion segmentation. It includes 2594 images.

27 papers1 benchmarksImages, Medical

2D-3D Match Dataset

2D-3D Match Dataset is a new dataset of 2D-3D correspondences by leveraging the availability of several 3D datasets from RGB-D scans. Specifically, the data from SceneNN and 3DMatch are used. The training dataset consists of 110 RGB-D scans, of which 56 scenes are from SceneNN and 54 scenes are from 3DMatch. The 2D-3D correspondence data is generated as follows. Given a 3D point which is randomly sampled from a 3D point cloud, a set of 3D patches from different scanning views are extracted. To find a 2D-3D correspondence, for each 3D patch, its 3D position is re-projected into all RGB-D frames for which the point lies in the camera frustum, taking occlusion into account. The corresponding local 2D patches around the re-projected point are extracted. In total, around 1.4 millions 2D-3D correspondences are collected.

27 papers0 benchmarksImages

Multi-Modal CelebA-HQ

Multi-Modal-CelebA-HQ is a large-scale face image dataset that has 30,000 high-resolution face images selected from the CelebA dataset by following CelebA-HQ. Each image has high-quality segmentation mask, sketch, descriptive text, and image with transparent background.

27 papers7 benchmarksImages, Texts

DeeperForensics-1.0

DeeperForensics-1.0 represents the largest face forgery detection dataset by far, with 60,000 videos constituted by a total of 17.6 million frames, 10 times larger than existing datasets of the same kind. The full dataset includes 48,475 source videos and 11,000 manipulated videos. The source videos are collected on 100 paid and consented actors from 26 countries, and the manipulated videos are generated by a newly proposed many-to-many end-to-end face swapping method, DF-VAE. 7 types of real-world perturbations at 5 intensity levels are employed to ensure a larger scale and higher diversity. Image Source: https://github.com/EndlessSora/DeeperForensics-1.0

27 papers0 benchmarksImages, Videos

IntrA

IntrA is an open-access 3D intracranial aneurysm dataset that makes the application of points-based and mesh-based classification and segmentation models available. This dataset can be used to diagnose intracranial aneurysms and to extract the neck for a clipping operation in medicine and other areas of deep learning, such as normal estimation and surface reconstruction.

27 papers11 benchmarksImages

VQA-HAT (VQA Human Attention)

VQA-HAT (Human ATtention) is a dataset to evaluate the informative regions of an image depending on the question being asked about it. The dataset consists of human visual attention maps over the images in the original VQA dataset. It contains more than 60k attention maps.

27 papers0 benchmarksImages
PreviousPage 32 of 164Next