Datasets

1,019 machine learning datasets

1,019 dataset results

UCF-101 VIPriors subset

The VIriors Action Recognition Challenge uses a subset of the UCF101 action recognition dataset:

VCAS-Motion (Video Class Agnostic Segmentation Benchmark)

Video class agnostic segmentation (VCAS) is the task of segmenting objects without regards to its semantics combining appearance, motion and geometry from monocular video sequences. The main motivation behind this is to account for unknown objects in the scene and to act as a redundant signal along with the segmentation of known classes for better safety as shown in the following Figure.

1 papers0 benchmarksVideos

Win-Fail Action Understanding

First of its kind paired win-fail action understanding dataset with samples from the following domains: “General Stunts,” “Internet Wins-Fails,” “Trick Shots,” & “Party Games.” The task is to identify successful and failed attempts at various activities. Unlike existing action recognition datasets, intra-class variation is high making the task challenging, yet feasible.

1 papers3 benchmarksVideos

Multimodal PISA (Multimodal Piano Skills Assessment)

Dataset for multimodal skills assessment focusing on assessing piano player’s skill level. Annotations include player's skills level, and song difficulty level. Bounding box annotations around pianists' hands are also provided.

1 papers5 benchmarksAudio, Videos

AMT Objects

AMT Objects is a large dataset of object centric videos suitable for training and benchmarking models for generating 3D models of objects from a small number of photos of the objects. The dataset consists of multiple views of a large collection of object instances.

1 papers0 benchmarks3D, Videos

ConferenceVideoSegmentationDataset

This is a video and image segmentation dataset for human head and shoulders, relevant for creating elegant media for videoconferencing and virtual reality applications. The source data includes ten online conference-style green screen videos. The authors extracted 3600 frames from the videos and generated the ground truth masks for each character in the video, and then applied virtual background to the frames to generate the training/testing sets.

1 papers0 benchmarksImages, Videos

WiTA (Writing in The Air)

WiTA (Writing in The Air) is a dataset for the challenging writing in the air (WiTA) task -- an elaborate task bridging vision and NLP. The dataset consists of five sub-datasets in two languages (Korean and English) and amounts to 209,926 video instances from 122 participants. Finger movement for WiTA is captured with RGB cameras to ensure wide accessibility and cost-efficiency.

1 papers0 benchmarksVideos

Acticipate

Acticipate is a publicly available dataset with recordings of human body-motion and eye-gaze, acquired in an experimental scenario with an actor interacting with three subjects. It contains synchronised and labelled video+gaze and body motion in a dyadic scenario of interaction.

1 papers0 benchmarksVideos

Extended UCF Crime

The Extended UCF Crime extends the UCF Crime data set that consists of 13 anomaly classes. The extension adds two different anomaly classes to the data set, which are ”molotov bomb” and ”protest” classes. It also adds 33 videos to the fighting class. In total, the extension adds 216 videos to the training set, 17 videos to the test set.

1 papers0 benchmarksVideos

Near-Collision

Near-Collision is a large-scale dataset of 13,658 egocentric video snippets of humans navigating in indoor hallways. In order to obtain ground truth annotations of human pose, the videos are provided with the corresponding 3D point cloud from LIDAR.

1 papers0 benchmarksLiDAR, Point cloud, Videos

Dense Forest Trail

Dense Forest Trail is an UAV dataset collected from a variety of simulated environment in Unreal Engine.

1 papers0 benchmarksVideos

The RBO Dataset of Articulated Objects and Interactions

The RBO dataset of articulated objects and interactions is a collection of 358 RGB-D video sequences (67:18 minutes) of humans manipulating 14 articulated objects under varying conditions (light, perspective, background, interaction). All sequences are annotated with ground truth of the poses of the rigid parts and the kinematic state of the articulated object (joint states) obtained with a motion capture system. We also provide complete kinematic models of these objects (kinematic structure and three-dimensional textured shape models). In 78 sequences the contact wrenches during the manipulation are also provided.

1 papers0 benchmarks3d meshes, Point cloud, RGB-D, Time series, Videos

D-OCC (Dynamic-OneCommon Corpus)

D-OCC is a large-scale dataset of 5,617 dialogues to enable fine-grained evaluation and analysis of various dialogue systems. It is used to study common grounding in dynamic environments.

1 papers0 benchmarksTexts, Videos

Visuomotor affordance learning (VAL) robot interaction dataset

This data contains about 2500 trajectories (with images and actions) of a Sawyer robot interacting with various objects.

1 papers0 benchmarksActions, Images, Videos

Extended YouTube Faces (E-YTF)

The proposed Extended-YouTube Faces (E-YTF) is an extension of the famous YouTube Faces (YTF) dataset and is specifically designed to further push the challenges of face recognition by addressing the problem of open-set face identification from heterogeneous data i.e. still images vs video.

1 papers0 benchmarksImages, Videos

TLFM dataset (TLFM dataset for microscopy image sequence generation)

TLFM dataset structured in sequences of at least nine timesteps. The dataset includes 9696 images of both brightfield and green fluorescent protein channels at a resolution of 256 × 256. Dataset for multi-domain (BF and GFP) microscopy image sequence generation.

1 papers0 benchmarksImages, Videos

Reasonable Crowd

The Reasonable Crowd dataset is a dataset to evaluate autonomous driving in a limited operating domain. The data consists of 92 traffic scenarios, with multiple ways of traversing each scenario. Multiple annotators expressed their preference between pairs of scenario traversals.

1 papers0 benchmarksVideos

TinyVIRAT-v2

TinyVIRAT-v2 is a benchmark dataset for recognizing real-world low-resolution activities present in videos. The dataset is comprised of naturally occuring low-resolution actions. This is an extension of the TinyVIRAT dataset and consists of actions with multiple labels. The videos are extracted from security videos which makes them realistic and more challenging.

1 papers0 benchmarksVideos

Well-being Dataset (Cambridge Well-being Dataset for Psychological Distress Analysis)

The dataset is a private dataset collected for automatic analysis of psychological distress. It contains self-reported distress labels provided by human volunteers. The dataset consists of 30-min interview recordings of participants.

1 papers1 benchmarksAudio, Speech, Time series, Videos

TrUMAn (Trope Understanding in Movies and Animations)

Trope Understanding in Movies and Animations (TrUMAn) is a dataset intending to evaluate and develop learning systems beyond visual signals.

1 papers0 benchmarksVideos

PreviousPage 39 of 51Next