TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

3,275 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

3,275 dataset results

Grasp MultiObject

Robotic grasp dataset for multi-object multi-grasp evaluation with RGB-D data. This dataset is annotated using the same protocol as Cornell Dataset, and can be used as multi-object extension of Cornell Dataset.

1 papers0 benchmarksImages

YFCC100M Fine-Grained Geolocation

The YFCC100M Fine-Grained Geolocation dataset is a subset of 100 a set of 36,146 YFCC100M images that had Flickr tags that could be identified as corresponding to one of the labels in the iNaturalist 2017 dataset. The 36,146 images that were selected so have the following characteristics: the image must have geolocation available, the image must have at most one iNaturalist label, at most ten examples were retained for each label.

1 papers0 benchmarksImages

T2 Guiding

T2 Guiding is a dataset of 1000 images, each with six image labels. The images are from the Open Images Dataset (OID) and the dataset includes 2 sets of machine-generated labels for these images.

1 papers0 benchmarksImages

LIV360SV (Liverpool 360 degree Street View)

The dataset contains 26,645, 360 degree, street-level images collected via cycling with a GoPro Fusion camera, recorded Jan 14th -- 18th 2020. 10,106 advertisements were identified and classified as food (1335), alcohol (217), gambling (149) and other (8405) (e.g., cars and broadband).

1 papers0 benchmarksImages

ImagiFilter

ImagiFilter focusses on photographic and/or natural images, a very common use-case in computer vision research. Annotations for coarse prediction are provided, i.e. photographic vs. non-photographic, and smaller fine-grained prediction tasks where the non-photographic class is broken down into five classes: maps, drawings, graphs, icons, and sketches.

1 papers0 benchmarksImages

CROSS (Cross-Reference Omnidirectional Stitching IQA)

Cross-Reference Omnidirectional Stitching IQA is a novel omnidirectional image dataset containing stitched images as well as dual-fisheye images captured from standard quarters of 0◦, 90◦ , 180◦ and 270◦. In this manner, when evaluating the quality of an image stitched from a pair of fisheye images (e.g., 0◦ and 180◦), the other pair of fisheye images (e.g., 90◦ and 270◦) can be used as the cross-reference to provide ground-truth observations of the stitching regions.

1 papers0 benchmarksImages

DensePose-Track

DensePose-Track is a dataset of videos where selected frames are annotated in the traditional DensePose manner.

1 papers0 benchmarksImages

Ciona17

Ciona17 is a semantic segmentation dataset with pixel-level annotations pertaining to invasive species in a marine environment. Diverse outdoor illumination, a range of object shapes, colour, and severe occlusion provide a significant real world challenge for the computer vision community.

1 papers0 benchmarksImages

MineNav

MinNav is a synthetic dataset based on the sandbox game Minecraft. The dataset uses several plug-in program to generate rendered image sequences with time-aligned depth maps, surface normal maps and camera poses. Thanks for the large game's community, there is an extremely large number of 3D open-world environment, users can find suitable scenes for shooting and build data sets through it and they can also build scenes in-game.

1 papers0 benchmarksImages

ARVSU (Addressee Recognition in Visual Scenes with Utterances)

ARVSU contains a vast body of image variations in visual scenes with an annotated utterance and a corresponding addressee for each scenario.

1 papers0 benchmarksAudio, Images

Event-Stream Dataset

Event-Stream Dataset is a robotic grasping dataset with 91 objects.

1 papers0 benchmarksImages

SemanticUSL

SemanticUSL is a dataset for domain adaptation for LiDAR point cloud semantic segmentation. The dataset has the same data format and ontology as SemanticKITTI.

1 papers0 benchmarksImages

WildestFaces

WildestFaces is tailored to study cross-domain recognition under a variety of adverse conditions.

1 papers0 benchmarksImages

FAD (Face Attributes Dataset)

FAD is a dataset that have roughly 200,000 attribute labels for the above traits, for over 10,000 facial images.

1 papers0 benchmarksImages

HARRISON

HARRISON dataset is a benchmark on hashtag recommendation for real world images in social networks. The HARRISON dataset is a realistic dataset, composed of 57,383 photos from Instagram and an average of 4.5 associated hashtags for each photo.

1 papers0 benchmarksImages

MVB (Multi View Baggage)

MVB (Multi View Baggage) is a dataset for baggage ReID task which has some essential differences from person ReID. The features of MVB are three-fold. First, MVB is the first publicly released large-scale dataset that contains 4519 baggage identities and 22660 annotated baggage images as well as its surface material labels. Second, all baggage images are captured by specially-designed multi-view camera system to handle pose variation and occlusion, in order to obtain the 3D information of baggage surface as complete as possible. Third, MVB has remarkable inter-class similarity and intra-class dissimilarity, considering the fact that baggage might have very similar appearance while the data is collected in two real airport environments, where imaging factors varies significantly from each other.

1 papers0 benchmarksImages

SESIV (SEmantic Salient Instance Video)

SEmantic Salient Instance Video (SESIV) dataset is obtained by augmenting the DAVIS-2017 benchmark dataset by assigning semantic ground-truth for salient instance labels. The SESIV dataset consists of 84 high-quality video sequences with pixel-wisely per-frame ground-truth labels.

1 papers0 benchmarksImages

SVLD (Social Vision and Language Dataset)

The social vision and language dataset is a large-scale multimodal dataset designed for research into social contextual learning.

1 papers0 benchmarksImages, Texts

BigBIRD (Big Berkeley Instance Recognition Dataset)

BigBIRD is a 3D dataset of 125 objects, with the following data for each object:

1 papers0 benchmarksImages, Point cloud, RGB-D

CUHK Face Alignment Database

The CUHK Face Alignment Database is dataset with 13,466 face images, among which 5, 590 images are from LFW and the remaining 7, 876 images are downloaded from the web. Each face is labeled with the positions of five facial keypoints. 10,000 images are used for training and the remaining 3,466 images for validation.

1 papers0 benchmarksImages
PreviousPage 113 of 164Next