Datasets

3,275 machine learning datasets

3,275 dataset results

ARCTIC (Articulated Objects in Free-form Hand Interaction)

ARCTIC is a dataset of free-form interactions of hands and articulated objects. ARCTIC has 1.2M images paired with accurate 3D meshes for both hands and for objects that move and deform over time. The dataset also provides hand-object contact information.

23 papers0 benchmarks3d meshes, Images

ImageNet-W (ImageNet-Watermark)

ImageNet-W(atermark) is a test set to evaluate models’ reliance on the newly found watermark shortcut in ImageNet, which is used to predict the carton class. ImageNet-W is created by overlaying transparent watermarks on the ImageNet validation set. Two metrics are used to evaluate watermark shortcut reliance: (1) IN-W Gap: the top-1 accuracy drop from ImageNet to ImageNet-W, (2) Carton Gap: carton class accuracy increase from ImageNet to ImageNet-W. Combining ImageNet-W with previous out-of-distribution variants of ImageNet (e.g., Stylized ImageNet, ImageNet-R, ImageNet-9) forms a comprehensive suite of multi-shortcut evaluation on ImageNet.

23 papers0 benchmarksImages

DigestPath

Introduced by Da et al. in DigestPath: a Benchmark Dataset with Challenge Review for the Pathological Detection and Segmentation of Digestive-System

23 papers2 benchmarksImages, Medical

PIE-Bench (Prompt-based Image Editing Benchmark)

PIE-Bench comprises 700 images featuring 10 distinct editing types. Images are evenly distributed in natural and artificial scenes (e.g., paintings) among four categories: animal, human, indoor, and outdoor. Each image in PIE-Bench includes five annotations: source image prompt, target image prompt, editing instruction, main editing body, and the editing mask. Notably, the editing mask annotation (indicating the anticipated editing region) is crucial in accurate metrics computations as we expect the editing to only occur within a designated area.

23 papers16 benchmarksImages, Texts

CUFSF (CUHK Face Sketch FERET Database)

The CUHK Face Sketch FERET (CUFSF) is a dataset for research on face sketch synthesis and face sketch recognition. It contains two types of face images: photo and sketch. Total 1,194 images (one image per subject) were collected with lighting variations from the FERET dataset. For each subject, a sketch is drawn with shape exaggeration.

22 papers24 benchmarksImages

SKU110K

The Sku110k dataset provides 11,762 images with more than 1.7 million annotated bounding boxes captured in densely packed scenarios, including 8,233 images for training, 588 images for validation, and 2,941 images for testing. There are around 1,733,678 instances in total. The images are collected from thousands of supermarket stores and are of various scales, viewing angles, lighting conditions, and noise levels. All the images are resized into a resolution of one megapixel. Most of the instances in the dataset are tightly packed and typically of a certain orientation in the rage of [−15∘, 15∘].

22 papers0 benchmarksImages

LOCATA

The LOCATA dataset is a dataset for acoustic source localization. It consists of real-world ambisonic speech recordings with optically tracked azimuth-elevation labels.

22 papers0 benchmarksAudio, Images

BAR (Biased Action Recognition)

Biased Action Recognition (BAR) dataset is a real-world image dataset categorized as six action classes which are biased to distinct places. The authors settle these six action classes by inspecting imSitu, which provides still action images from Google Image Search with action and place labels. In detail, the authors choose action classes where images for each of these candidate actions share common place characteristics. At the same time, the place characteristics of action class candidates should be distinct in order to classify the action only from place attributes. The select pairs are six typical action-place pairs: (Climbing, RockWall), (Diving, Underwater), (Fishing, WaterSurface), (Racing, APavedTrack), (Throwing, PlayingField),and (Vaulting, Sky).

22 papers2 benchmarksImages

COWC (Cars Overhead With Context)

The Cars Overhead With Context (COWC) data set is a large set of annotated cars from overhead. It is useful for training a device such as a deep neural network to learn to detect and/or count cars.

22 papers0 benchmarksImages

IMDb-Face

IMDb-Face is large-scale noise-controlled dataset for face recognition research. The dataset contains about 1.7 million faces, 59k identities, which is manually cleaned from 2.0 million raw images. All images are obtained from the IMDb website.

22 papers0 benchmarksImages

JHU-CROWD

(JHU-CROWD) a crowd counting dataset that contains 4,250 images with 1.11 million annotations. This dataset is collected under a variety of diverse scenarios and environmental conditions. Specifically, the dataset includes several images with weather-based degradations and illumination variations in addition to many distractor images, making it a very challenging dataset. Additionally, the dataset consists of rich annotations at both image-level and head-level.

22 papers0 benchmarksImages

WIDER (Web Image Dataset for Event Recognition)

WIDER is a dataset for complex event recognition from static images. As of v0.1, it contains 61 event categories and around 50574 images annotated with event class labels.

22 papers1 benchmarksImages

CxC (Crisscrossed Captions)

Crisscrossed Captions (CxC) contains 247,315 human-labeled annotations including positive and negative associations between image pairs, caption pairs and image-caption pairs.

22 papers1 benchmarksImages, Texts

PhotoShape

The PhotoShape dataset consists of photorealistic, relightable, 3D shapes produced by the work proposed in the work of Park et al. (2021).

22 papers2 benchmarksImages

EasyCom

The Easy Communications (EasyCom) dataset is a world-first dataset designed to help mitigate the cocktail party effect from an augmented-reality (AR) -motivated multi-sensor egocentric world view. The dataset contains AR glasses egocentric multi-channel microphone array audio, wide field-of-view RGB video, speech source pose, headset microphone audio, annotated voice activity, speech transcriptions, head and face bounding boxes and source identification labels. We have created and are releasing this dataset to facilitate research in multi-modal AR solutions to the cocktail party problem.

22 papers15 benchmarksAudio, Dialog, Images, RGB Video, Speech, Time series, Videos

Market-1501-C

Market-1501-C is an evaluation set that consists of algorithmically generated corruptions applied to the Market-1501 test-set. These corruptions consist of Noise: Gaussian, shot, impulse, and speckle; Blur: defocus, frosted glass, motion, zoom, and Gaussian; Weather: snow, frost, fog, brightness, spatter, and rain; Digital: contrast, elastic, pixel, JPEG compression, and saturate. Each corruption has five severity levels, resulting in 100 distinct corruptions.

22 papers6 benchmarksImages

3D Cars

Car CAD models from "3d object detection and viewpoint estimation with a deformable 3d cuboid model" were used to generate the dataset. For each of the 199 car models, the authors generated $64\times64$ color renderings from 24 rotation angles each offset by 15 degrees, as well as from 4 different camera elevations.

22 papers0 benchmarksImages

Satlas

Satlas is a remote sensing dataset and benchmark that is large in both breadth, featuring all of the aforementioned applications and more, as well as scale, comprising 290M labels under 137 categories and 7 label modalities.

22 papers0 benchmarksImages

VizWiz-Classification

Our goal is to improve upon the status quo for designing image classification models trained in one domain that perform well on images from another domain. Complementing existing work in robustness testing, we introduce the first test dataset for this purpose which comes from an authentic use case where photographers wanted to learn about the content in their images. We built a new test set using 8,900 images taken by people who are blind for which we collected metadata to indicate the presence versus absence of 200 ImageNet object categories. We call this dataset VizWiz-Classification.

22 papers8 benchmarksImages

WebSRC (WebSRC: A Dataset for Web-Based Structural Reading Comprehension)

WebSRC is a novel Web-based Structural Reading Comprehension dataset. It consists of 0.44M question-answer pairs, which are collected from 6.5K web pages with corresponding HTML source code, screenshots and metadata. Each question in WebSRC requires a certain structural understanding of a web page to answer, and the answer is either a text span on the web page or yes/no.

22 papers2 benchmarksImages, Tables, Texts

PreviousPage 36 of 164Next