Datasets

19,997 machine learning datasets

19,997 dataset results

AIC (AI Challenger)

A large-scale dataset named AIC (AI Challenger) with three sub-datasets, human keypoint detection (HKD), large-scale attribute dataset (LAD) and image Chinese captioning (ICC).

14 papers15 benchmarks

The MSRVTT-MC (Multiple Choice) dataset is a video question-answering dataset created based on the MSR-VTT dataset. It consists of 2,990 questions generated from 10,000 video clips with associated ground truth captions. For each question, there are five candidate captions, including the ground truth caption and four randomly sampled negative choices. The objective of the dataset is to choose the correct answer from the five candidate captions.

14 papers1 benchmarks

T2Dv2

The T2Dv2 dataset consists of 779 tables originating from the English-language subset of the WebTables corpus. 237 tables are annotated for the Table Type Detection task, 236 for the Columns Property Annotation (CPA) task and 235 for the Row Annotation task. The annotations that are used are DBpedia types, properties and entities.

14 papers4 benchmarksTabular

DailyTalk

DailyTalk is a high-quality conversational speech dataset designed for Text-to-Speech. We sampled, modified, and recorded 2,541 dialogues from the open-domain dialogue dataset DailyDialog which are adequately long to represent context of each dialogue.

14 papers0 benchmarksAudio

ArSarcasm

ArSarcasm is a new Arabic sarcasm detection dataset. The dataset was created using previously available Arabic sentiment analysis datasets (SemEval 2017 and ASTD) and adds sarcasm and dialect labels to them. The dataset contains 10,547 tweets, 1,682 (16%) of which are sarcastic.

14 papers0 benchmarksTexts

H2O (2 Hands and Objects)

We present a comprehensive framework for egocentric interaction recognition using markerless 3D annotations of two hands manipulating objects. To this end, we propose a method to create a unified dataset for egocentric 3D interaction recognition. Our method produces annotations of the 3D pose of two hands and the 6D pose of the manipulated objects, along with their interaction labels for each frame. Our dataset, called H2O (2 Hands and Objects), provides synchronized multi-view RGB-D images, interaction labels, object classes, ground-truth 3D poses for left & right hands, 6D object poses, ground-truth camera poses, object meshes and scene point clouds. To the best of our knowledge, this is the first benchmark that enables the study of first-person actions with the use of the pose of both left and right hands manipulating objects and presents an unprecedented level of detail for egocentric 3D interaction recognition. We further propose the method to predict interaction classes by estima

14 papers18 benchmarks

SinD (A Drone Dataset at Signalized Intersection in China)

The SIND dataset is based on 4K video captured by drones, providing information including traffic participant trajectories, traffic light status, and high-definition maps

14 papers0 benchmarksTables

MIntRec

MIntRec is a novel dataset for multimodal intent recognition. It formulates coarse-grained and fine-grained intent taxonomies based on the data collected from the TV series Superstore. The dataset consists of 2,224 high-quality samples with text, video, and audio modalities and has multimodal annotations among twenty intent categories.

14 papers4 benchmarksImages, Texts

HR-ShanghaiTech

The human-Related version of the ShanghaiTech Campus, was first presented by Morais et al. in the paper "Learning Regularity in Skeleton Trajectories for Anomaly Detection in Videos".

14 papers3 benchmarksVideos

LAV-DF (Localized Audio Visual DeepFake Dataset)

Localized Audio Visual DeepFake Dataset (LAV-DF).

14 papers4 benchmarksAudio, Videos

ELPV (A dataset of functional and defective solar cells extracted from EL images of solar modules)

The dataset contains 2,624 samples of $300\times300$ pixels 8-bit grayscale images of functional and defective solar cells with varying degree of degradations extracted from 44 different solar modules. The defects in the annotated images are either of intrinsic or extrinsic type and are known to reduce the power efficiency of solar modules.

14 papers0 benchmarksImages

Texas (48%/32%/20% fixed splits)

Node classification on Texas with the fixed 48%/32%/20% splits provided by Geom-GCN.

14 papers2 benchmarksGraphs

Film(48%/32%/20% fixed splits)

Node classification on Film with the fixed 48%/32%/20% splits provided by Geom-GCN.

14 papers2 benchmarksGraphs

PromptSpeech

PromptSpeech is a dataset that consists of speech and the corresponding prompts. We synthesize speech with 5 different style factors (gender, pitch, speaking speed, volume, and emotion) from a commercial TTS API. The emotion factor has 5 categories and the gender factor has 2 categories.

14 papers0 benchmarksSpeech

OVAD benchmark (Open-Vocabulary Attribute Detection)

Vision-language modeling has enabled open-vocabulary tasks where predictions can be queried using any text prompt in a zero-shot manner. Existing open-vocabulary tasks focus on object classes, whereas research on object attributes is limited due to the lack of a reliable attribute-focused evaluation benchmark. This paper introduces the Open-Vocabulary Attribute Detection (OVAD) task and the corresponding OVAD benchmark. The objective of the novel task and benchmark is to probe object-level attribute information learned by vision-language models. To this end, we created a clean and densely annotated test set covering 117 attribute classes on the 80 object classes of MS COCO. It includes positive and negative annotations, which enables open-vocabulary evaluation. Overall, the benchmark consists of 1.4 million annotations. For reference, we provide a first baseline method for open-vocabulary attribute detection. Moreover, we demonstrate the benchmark's value by studying the attribute dete

14 papers8 benchmarksImages, Texts

Fisheye

Fisheye dataset comprises of synthetically generated fisheye sequences and fisheye video sequences captured with an actual fisheye camera designed for fisheye motion estimation.

14 papers0 benchmarksVideos

ShapeGlot (ShapeGlot: Learning Language for Shape Differentiation)

ShapeGlot: Learning Language for Shape Differentiation

14 papers0 benchmarks

V3C (Vimeo Creative Commons Collection)

The Vimeo Creative Commons Collection, in short V3C, is a collection of 28’450 videos (with overall length of about 3’800 h) published under creative commons license on Vimeo. V3C comes with a shot segmentation for each video, together with the resulting keyframes in original as well as reduced resolution and additional metadata. It is intended to be used from 2019 at the International large-scale TREC Video Retrieval Evaluation campaign (TRECVid).

14 papers0 benchmarks

UCR Anomaly Archive

The UCR Anomaly Archive is a collection of 250 uni-variate time series collected in human medicine, biology, meteorology and industry. The collected time series contain a few natural anomalies though the majority of the anomalies are artificial . The dataset was first used in an anomaly detection contest preceding the ACM SIGKDD conference 2021. Each of the time series contains exactly one, occasionally subtle anomaly after a given time stamp. The data before that timestamp can be considered normal. The time series collected in the UCR Anomaly Archive can be categorized into 12 types originating from the four domains human medicine, meteorology, biology and industry. The distribution across the domains is highly imbalanced with around 64% of the times series being collected in human medicine applications, 22% in biology, 9% in industry and 5% being air temperature measurements. The time series within a single type (e.g. ECG) are not completely unique, but differ in terms of injected an

14 papers4 benchmarksTime series

DPD (Dual-view) (Image Defocus Deblurring)

DPD dataset has two versions - single view and dual-view. This branch is for dual view benchmark evaluation.

14 papers0 benchmarks

PreviousPage 129 of 1000Next