Datasets

3,275 machine learning datasets

3,275 dataset results

3DIdent

Novel benchmark which features aspects of natural scenes, e.g. a complex 3D object and different lighting conditions, while still providing access to the continuous ground-truth factors.

10 papers1 benchmarksImages

StyleGAN-Human

A large-scale human image dataset with over 230K samples capturing diverse poses and textures.

10 papers0 benchmarksImages

The CropAndWeed dataset is focused on the fine-grained identification of 74 relevant crop and weed species with a strong emphasis on data variability. Annotations of labeled bounding boxes, semantic masks and stem positions are provided for about 112k instances in more than 8k high-resolution images of both real-world agricultural sites and specifically cultivated outdoor plots of rare weed types. Additionally, each sample is enriched with meta-annotations regarding environmental conditions.

10 papers0 benchmarksImages

Mocheg

A large-scale dataset that consists of 21,184 claims, where each claim is assigned a truthfulness label and ruling statement, with 58,523 pieces of evidence in the form of text and images. It supports the end-to-end multimodal fact-checking and explanation generation, where the input is a claim and a large collection of web sources, including articles, images, videos, and tweets, and the goal is to assess the truthfulness of the claim by retrieving relevant evidence and predicting a truthfulness label (i.e., support, refute and not enough information), and generate a rationalization statement to explain the reasoning and ruling process.

10 papers0 benchmarksImages, Texts

MP-DocVQA (Multipage Document Visual Question Answering)

The dataset is aimed to perform Visual Question Answering on multipage industry scanned documents. The questions and answers are reused from Single Page DocVQA (SP-DocVQA) dataset. The images also corresponds to the same in original dataset with previous and posterior pages with a limit of up to 20 pages per document.

10 papers0 benchmarksImages, Texts

DOTA 2.0 (Dataset of Object deTection in Aerial images)

—In the past decade, object detection has achieved significant progress in natural images but not in aerial images, due to the massive variations in the scale and orientation of objects caused by the bird’s-eye view of aerial images. More importantly, the lack of large-scale benchmarks has become a major obstacle to the development of object detection in aerial images (ODAI). In this paper, we present a large-scale Dataset of Object deTection in Aerial images (DOTA) and comprehensive baselines for ODAI. The proposed DOTA dataset contains 1,793,658 object instances of 18 categories of oriented-bounding-box annotations collected from 11,268 aerial images. Based on this large-scale and well-annotated dataset, we build baselines covering 10 state-of-the-art algorithms with over 70 configurations, where the speed and accuracy performances of each model have been evaluated. Furthermore, we provide a code library for ODAI and build a website for evaluating different algorithms. Previous chall

10 papers0 benchmarksImages

InDL (In-Diagram Logic)

Dataset Introduction

10 papers1 benchmarksImages

Description Detection Dataset

Description Detection Dataset ($D^3$, /dikju:b/) is an attempt at creating a next-generation object detection dataset. Unlike traditional detection datasets, the class names of the objects are no longer simple nouns or noun phrases, but rather complex and descriptive, such as a dog not being held by a leash. For each image in the dataset, any object that matches the description is annotated. The dataset provides annotations such as bounding boxes and finely crafted instance masks.It comprises of 422 well-designed descriptions and 24,282 positive object-description pairs.

10 papers15 benchmarksImages, Texts

DemogPairs

Although deep face recognition has achieved impressive results in recent years, there is increasing controversy regarding racial and gender bias of the models, questioning their trustworthiness and deployment into sensitive scenarios. DemogPairs is a validation set with 10.8K facial images and 58.3M identity verification pairs, distributed in demographically-balanced folds of Asian, Black and White females and males. We also propose a benchmark of experiments using DemogPairs over state-of-the-art deep face recognition models in order to analyze their cross-demographic behavior and potential demographic biases (see figure below).

10 papers0 benchmarksImages

FunnyBirds

FunnyBirds is a synthetic vision dataset that is developed to automatically and quantitatively analyze XAI methods. It consists of 50 500 images (50k train, 500 test) of 50 synthetic bird species.

10 papers0 benchmarksImages

GRAZPEDWRI-DX

GRAZPEDWRI-DX is a public dataset of 20,327 pediatric wrist trauma X-ray images released by the University of Medicine of Graz. These X-ray images were collected by multiple pediatric radiologists at the Department for Pediatric Surgery of the University Hospital Graz between 2008 and 2018, involving 6,091 patients and a total of 10,643 studies. This dataset is annotated with 74,459 image labels, featuring a total of 67,771 labeled objects.

10 papers20 benchmarksImages, Medical

ViP-Bench (Making Large Multimodal Models Understand Arbitrary Visual Prompts)

ViP-Bench is a comprehensive benchmark designed to assess the capability of multimodal models in understanding visual prompts across multiple dimensions. It aims to evaluate how well these models interpret various visual prompts, including recognition, OCR, knowledge, math, relationship reasoning, and language generation. ViP-Bench includes a diverse set of 303 images and questions, providing a thorough assessment of visual understanding capabilities at the region level. This benchmark sets a foundation for future research into multimodal models with arbitrary visual prompts.

10 papers4 benchmarksImages, Interactive, Texts

RVSD (Realistic Video DeSnowing Dataset)

Realistic Video DeSnowing Dataset (RVSD) contains a total of 110 pairs of videos. Each pair contains snowy and hazy videos and corresponding snow-free and haze-free ground truth videos. We use a rendering engine (Unreal Engine 5) and various augmentation techniques to generate snow and haze with diverse and realistic physical properties. This results in more realistic and varied synthesized videos, which improve the model’s performance on real-world data.

10 papers0 benchmarksImages, Videos

HO-3D v3

The HO-3D v3 is the version 3 of the HO-3D dataset with more accurate hand-object poses. HO-3D v3 provides more accurate annotations for both the hand and object poses thus resulting in better estimates of contact regions between the hand and the object. The table below shows the statistics of the HO-3D v2 compared to the HO-3D v3 datasets.

10 papers36 benchmarksImages

CDDB (Continual Deepfake Detection Benchmark)

Abstract: There have been emerging a number of benchmarks and techniques for the detection of deepfakes. However, very few works study the detection of incrementally appearing deepfakes in the real-world scenarios. To simulate the wild scenes, this paper suggests a continual deepfake detection benchmark (CDDB) over a new collection of deepfakes from both known and unknown generative models. The suggested CDDB designs multiple evaluations on the detection over easy, hard, and long sequence of deepfake tasks, with a set of appropriate measures. In addition, we exploit multiple approaches to adapt multiclass incremental learning methods, commonly used in the continual visual recognition, to the continual deepfake detection problem. We evaluate existing methods, including their adapted ones, on the proposed CDDB. Within the proposed benchmark, we explore some commonly known essentials of standard continual learning. Our study provides new insights on these essentials in the context of cont

10 papers0 benchmarksImages

Atari-HEAD

Atari-HEAD is a dataset of human actions and eye movements recorded while playing Atari videos games. For every game frame, its corresponding image frame, the human keystroke action, the reaction time to make that action, the gaze positions, and immediate reward returned by the environment were recorded. The gaze data was recorded using an EyeLink 1000 eye tracker at 1000Hz. The human subjects are amateur players who are familiar with the games. The human subjects were only allowed to play for 15 minutes and were required to rest for at least 15 minutes before the next trial. Data was collected from 4 subjects, 16 games, 175 15-minute trials, and a total of 2.97 million frames/demonstrations.

9 papers0 benchmarksActions, Images, Tracking

Imagewoof

Imagewoof is a subset of 10 dog breed classes from Imagenet. The breeds are: Australian terrier, Border terrier, Samoyed, Beagle, Shih-Tzu, English foxhound, Rhodesian ridgeback, Dingo, Golden retriever, Old English sheepdog.

9 papers0 benchmarksImages

TUM-GAID

TUM-GAID (TUM Gait from Audio, Image and Depth) collects 305 subjects performing two walking trajectories in an indoor environment. The first trajectory is traversed from left to right and the second one from right to left. Two recording sessions were performed, one in January, where subjects wore heavy jackets and mostly winter boots, and another one in April, where subjects wore lighter clothes. The action is captured by a Microsoft Kinect sensor which provides a video stream with a resolution of 640×480 pixels and a frame rate around 30 FPS.

9 papers0 benchmarksAudio, Images, Videos

Middlebury 2005

Middlebury 2005 is a stereo dataset of indoor scenes.

9 papers0 benchmarksImages, Stereo

SketchyScene

SketchyScene is a large-scale dataset of scene sketches to advance research on sketch understanding at both the object and scene level. The dataset is created through a novel and carefully designed crowdsourcing pipeline, enabling users to efficiently generate large quantities of realistic and diverse scene sketches. SketchyScene contains more than 29,000 scene-level sketches, 7,000+ pairs of scene templates and photos, and 11,000+ object sketches. All objects in the scene sketches have ground-truth semantic and instance masks. The dataset is also highly scalable and extensible, easily allowing augmenting and/or changing scene composition.

9 papers0 benchmarksImages

PreviousPage 52 of 164Next