Datasets

1,019 machine learning datasets

1,019 dataset results

KT3DMoSeg

Please find more details of this dataset at https://alex-xun-xu.github.io/ProjectPage/CVPR_18/index.html

The Florentine dataset is a dataset of facial gestures which contains facial clips from 160 subjects (both male and female), where gestures were artificially generated according to a specific request, or genuinely given due to a shown stimulus. 1032 clips were captured for posed expressions and 1745 clips for induced facial expressions amounting to a total of 2777 video clips. Genuine facial expressions were induced in subjects using visual stimuli, i.e. videos selected randomly from a bank of Youtube videos to generate a specific emotion.

1 papers0 benchmarksVideos

Daimler Monocular Pedestrian Detection

The Daimler Monocular Pedestrian Detection dataset is a dataset for pedestrian detection in urban environments. The training set contains 15560 pedestrian samples (image cut-outs at 48×96 resolution) and 6744 additional full images without pedestrians for extracting negative samples. The test set contains an independent sequence with more than 21790 images and 56492 pedestrian labels (fully visible or partially occluded), captured from a vehicle during a 27 min driving through the urban traffic.

1 papers0 benchmarksImages, Interactive, Videos

CRIM13 (Caltech Resident-Intruder Mouse 13)

The Caltech Resident-Intruder Mouse dataset (CRIM13) consists of 237x2 videos (recorded with synchronized top and side view) of pairs of mice engaging in social behavior, catalogued into thirteen different actions. Each video lasts ~10min, for a total of 88 hours of video and 8 million frames. A team of behavior experts annotated each video on a frame-by-frame basis for a state-of-the-art study of the neurophysiological mechanisms involved in aggression and courtship in mice.

1 papers0 benchmarksVideos

Oxford Town Center

The Oxford Town Center dataset is a 5-minute video with 7500 frames annotated, which is divided into 6500 for training and 1000 for testing data for pedestrian detection. The data was recorded from a CCTV camera in Oxford for research and development into activity and face recognition.

1 papers1 benchmarksVideos

MobiFace

MobiFace is the first dataset for single face tracking in mobile situations. It consists of 80 unedited live-streaming mobile videos captured by 70 different smartphone users in fully unconstrained environments. Over 95K bounding boxes are manually labelled. The videos are carefully selected to cover typical smartphone usage. The videos are also annotated with 14 attributes, including 6 newly proposed attributes and 8 commonly seen in object tracking.

1 papers0 benchmarksVideos

EGO-CH (EGOcentric-Cultural Heritage)

EGO-CH is a dataset of egocentric videos for visitors’ behavior understanding. The dataset has been collected in two different cultural sites and includes more than 27 hours of video acquired by 70 subjects, including volunteers and 60 real visitors. The overall dataset includes labels for 26 environments and over 200 Points of Interest (POIs). Specifically, each video of EGO-CH has been annotated with 1) temporal labels specifying the current location of the visitor and the observed POI, 2) bounding box annotations around POIs. A large subset of the dataset, consisting of 60 videos, is also associated with surveys filled out by the visitors at the end of each visit.

1 papers0 benchmarksVideos

MLGESTURE DATASET

MlGesture is a dataset for hand gesture recognition tasks, recorded in a car with 5 different sensor types at two different viewpoints. The dataset contains over 1300 hand gesture videos from 24 participants and features 9 different hand gesture symbols. One sensor cluster with five different cameras is mounted in front of the driver in the center of the dashboard. A second sensor cluster is mounted on the ceiling looking straight down.

1 papers0 benchmarksImages, Videos

METU-VIREF Dataset

METU-VIREF is a video referring expression dataset comprising of videos from VIRAT Ground and ILSVRC2015 VID datasets. VIRAT is a surveillance dataset and contains mainly people and vehicles. To line up with this and restrict the domain, only videos that contain vehicles from the ILSVRC dataset are used. The METU-VIREF dataset does not contain whole videos from these datasets (the videos need to be downloaded from the respective sources) but just referring expressions for video sequences containing an object pair. For this, object pairs are chosen which had a relation that a meaningful referring expression could be written for.

1 papers0 benchmarksVideos

EDUVSUM (Educational Video Summarization)

EDUVSUM contains educational videos with subtitles from three popular e-learning platforms: Edx,YouTube, and TIB AV-Portal that cover the following topics: crash course on history of science and engineering, computer science, python and web programming, machine learning and computer vision, Internet of things (IoT), and software engineering. In total, the current version of the dataset contains 98 videos with ground truth values annotated by a user with an academic background in computer science.

1 papers0 benchmarksVideos

OSTD (Open-Source-Total-Distance)

This dataset consists of 18 movies with duration range between 10 and 104 minutes leveraged from the OVSD dataset (Rotman et al., 2016). For these videos, the summary length limit is set to be the minimum between 4 minutes and 10% of the video length.

1 papers0 benchmarksVideos

VOT2015 (Visual Object Tracking Challenge 2015)

VOT2015 is a visual object tracking dataset. The dataset comprises 60 short sequences showing various objects in challenging backgrounds. The sequences were chosen from a large pool of sequences from different sources.

1 papers0 benchmarksVideos

UCF50

UCF50 is an action recognition data set with 50 action categories, consisting of realistic videos taken from youtube. This data set is an extension of YouTube Action data set (UCF11) which has 11 action categories.

1 papers0 benchmarksVideos

ActioNet

ActioNet is a video task-based dataset collected in a synthetic 3D environment. It contains 3,038 annotated videos and hierarchical task structures over 65 individual household tasks from 120 different scenes. Each task is annotated across three to five different scenes by 10 different annotators. The tasks can be broken down into four categories: living room, bedroom, bathroom, kitchen.

1 papers0 benchmarksVideos

ActivityNet Thumbnails

Consists of 10,000+ video-sentence pairs with each accompanied by an annotated sentence specified video thumbnail.

1 papers0 benchmarksVideos

ApartmenTour

Contains a large number of online videos and subtitles.

1 papers0 benchmarksTexts, Videos

AVECL-UMons

A dataset for audio-visual event classification and localization in the context of office environments. The audio-visual dataset is composed of 11 event classes recorded at several realistic positions in two different rooms. Two types of sequences are recorded according to the number of events in the sequence. The dataset comprises 2662 unilabel sequences and 2724 multilabel sequences corresponding to a total of 5.24 hours.

1 papers0 benchmarksVideos

CLAD (Complex and Long Activities Dataset)

CLAD (Compled and Long Activities Dataset) is an activity dataset which exhibits real-life and diverse scenarios of complex, temporally-extended human activities and actions. The dataset consists of a set of videos of actors performing everyday activities in a natural and unscripted manner. The dataset was recorded using a static Kinect 2 sensor which is commonly used on many robotic platforms. The dataset comprises of RGB-D images, point cloud data, automatically generated skeleton tracks in addition to crowdsourced annotations.

1 papers0 benchmarksPoint cloud, RGB-D, Videos

D2City

A large-scale comprehensive collection of dashcam videos collected by vehicles on DiDi's platform. D2-City contains more than 10000 video clips which deeply reflect the diversity and complexity of real-world traffic scenarios in China.

1 papers0 benchmarksVideos

LSMDC-Context

The Large Scale Movie Description Challenge (LSMDC) - Context is an augmented version of the original LSMDC dataset with movie scripts as contextual text.

1 papers0 benchmarksVideos

PreviousPage 37 of 51Next