Datasets

1,019 machine learning datasets

1,019 dataset results

C3D features for PHD2GIF

The feature files are named with the youtube IDs. https://drive.google.com/drive/folders/10-6hkQxMKMGwLXANxfPRE7xw5PKiMjLn?usp=sharing

1 papers0 benchmarksVideos

MuVi (MusicVideos)

A dataset of music videos with continuous valence/arousal ratings as well as emotion tags.

1 papers0 benchmarksMusic, Videos

LSFB Datasets (French Belgian Sign Language Datasets)

Sign Language Datasets for French Belgian Sign Language This dataset is built upon the work of Belgian linguists from the University of Namur. During eight years, they've collected and annotated 50 hours of videos depicting sign language conversation. 100 signers were recorded, making it one of the most representative sign language corpus. The annotation has been sanitized and enriched with metadata to construct two, easy to use, datasets for sign language recognition. One for continuous sign language recognition and the other for isolated sign recognition.

1 papers0 benchmarksTexts, Videos

CANDOR Corpus (CANDOR = Conversation: A Naturalistic Dataset of Online Recordings)

The CANDOR corpus is a large, novel, multimodal corpus of 1,656 recorded conversations in spoken English. This 7+ million word, 850 hour corpus totals over 1TB of audio, video, and transcripts, with moment-to-moment measures of vocal, facial, and semantic expression, along with an extensive survey of speaker post conversation reflections.

1 papers0 benchmarksImages, Tabular, Texts, Time series, Videos

VidHarm

VidHarm is a professionally annotated dataset for detection of harmful content in video. Include 3589 annotate video clips from a variety of film trailers. In contrast to previous approaches which mostly use meta data from long sequences, it uses the raw video and focus on short clips.

1 papers0 benchmarksVideos

EgoMon (Egomon Gaze & Video dataset)

EgoMon Gaze & Video Dataset is an Egocentric (first person) Dataset that consists of 7 videos of 30 minutes, more or less, each one of them. - 7 videos with the gaze information plotted on them. - The same videos (without the gaze information plotted on them). - A total of 13428 images, more or less, that corresponds to each frame per second of all these videos. - 7 text files with the gaze data extracted from each video.

1 papers0 benchmarksImages, Videos

V3C1 (the Vimeo Creative Commons Collection 1)

The dataset has been designed to represent true web videos in the wild, with good visual quality and diverse content characteristics, and will serve as evaluation basis for the Video Browser Showdown 2019-2021 and TREC Video Retrieval (TRECVID) Ad-Hoc Video Search tasks 2019-2021. The dataset comes with a shot segmentation (around 1 million shots) for which we analyze content specifics and statistics. Our research shows that the content of V3C1 is very diverse, has no predominant characteristics and provides a low self-similarity. Thus it is very well suited for video retrieval evaluations as well as for participants of TRECVID AVS or the VBS.

1 papers0 benchmarksTexts, Videos

TRECVID-AVS20 (V3C1)

The dataset has been designed to represent true web videos in the wild, with good visual quality and diverse content characteristics, The test video collection for TRECVID-AVS2019-TRECVID-AVS2021, which contains 1,082,649 web video clips, with even more diverse content, no predominant characteristics and low self-similarity.

1 papers1 benchmarksTexts, Videos

Niramai Oncho Dataset (Niramai Onchocerciasis/RiverBlindness Dataset)

Onchocerciasis is causing blindness in over half a million people in the world today. Drug development for the disease is crippled as there is no way of measuring effectiveness of the drug without an invasive procedure. Drug efficacy measurement through assessment of viability of onchocerca worms requires the patients to undergo nodulectomy which is invasive, expensive, time-consuming, skill-dependent, infrastructure dependent and lengthy process.

1 papers0 benchmarksImages, Medical, Videos

NTIC Screening Dataset (Niramai Thermal Image for COVID19 Screening)

In the last two years, millions of lives have been lost due to COVID-19. Despite the vaccination programmes for a year, hospitalization rates and deaths are still high due to the new variants of COVID-19. Stringent guidelines and COVID-19 screening measures such as temperature check and mask check at all public places are helping reduce the spread of COVID-19. Visual inspections to ensure these screening measures can be taxing and erroneous. Automated inspection ensures an effective and accurate screening.

1 papers0 benchmarksImages, Videos

MC_GRID (Multi_Channel_Grid)

Here we release the dataset (Multi_Channel_Grid, abbreviated as MC_Grid) used in our paper LIMUSE: LIGHTWEIGHT MULTI-MODAL SPEAKER EXTRACTION.

1 papers0 benchmarksAudio, Speech, Videos

USC-GRAD-STDdb (Small Target Detection database)

USC-GRAD-STDdb comprises 115 video segments containing more than 25,000 annotated frames of HD 720p resolution (≈1280x720) with small objects of interest from 16 (≈4x4) to 256 (≈16x16) as pixel area. The length of the videos changes from 150 up to 500 frames. The size of every object is determined through the bounding box, so that a good annotation is of utmost importance for reliable performance metrics. As it may seem obvious, the smaller the object, the harder the annotation. The annotation has been carried out with the ViTBAT tool, adjusting the boxes as much as possible to the objects of interest in each video frame. In total, more than 56,000 ground truth labels have been generated.

1 papers10 benchmarksVideos

Custom Spatio-Temporal Action Video Dataset

This spatio-temporal actions dataset for video understanding consists of 4 parts: original videos, cropped videos, video frames, and annotation files. This dataset uses a proposed new multi-person annotation method of spatio-temporal actions. First, we use ffmpeg to crop the videos and frame the videos; then use yolov5 to detect human in the video frame, and then use deep sort to detect the ID of the human in the video frame. By processing the detection results of yolov5 and deep sort, we can get the annotation file of the spatio-temporal action dataset to complete the work of customizing the spatio-temporal action dataset.

1 papers0 benchmarksVideos

Kinetics-GEB+

Kinetics-GEB+ (Generic Event Boundary Captioning, Grounding and Retrieval) is a dataset that consists of over 170k boundaries associated with captions describing status changes in the generic events in 12K videos.

1 papers40 benchmarksVideos

W-Oops

W-Oops consists of 2,100 unintentional human action videos, with 44 goal-directed and 30 unintentional video-level activity labels collected through human annotations.

1 papers0 benchmarksVideos

WebVidVQA3M

A dataset automatically generated using question generation neural models and alt-text video captions from the WebVid dataset, with 3M video-question-answer triplets.

1 papers0 benchmarksTexts, Videos

Pre-Contest Workshop Video Recordings

In this Pre-Contest Workshop Video Recordings folder:

1 papers0 benchmarksAudio, Videos

Urban Soundscapes of the World

A main goal of the Urban Soundscapes of the World project is to create a reference database of examples of urban acoustic environments, consisting of high-quality immersive audiovisual recordings (360-degree video and spatial audio), in adherence to ISO 12913-2. Ultimately, this database may set the scope for immersive recording and reproducing urban acoustic environments with soundscape in mind.

1 papers0 benchmarksAudio, Videos

SoccerTrack Dataset

The SoccerTrack dataset comprises top-view and wide-view video footage annotated with bounding boxes. GNSS coordinates of each player are also provided. We hope that the SoccerTrack dataset will help advance the state of the art in multi-object tracking, especially in team sports.

1 papers0 benchmarksRGB Video, Tracking, Videos

Baxter-UR5_95-Objects

In this dataset two robots, Baxter and UR5, perform 8 behaviors (look, grasp, pick, hold, shake, lower, drop, and push) on 95 objects that vary by 5 color (blue, green, red, white, and yellow), 6 contents (wooden button, plastic dices, glass marbles, nuts & bolts, pasta, and rice), and 4 weights (empty, 50g, 100g, and 150g). There are 90 objects with contents (5 colors x 3 weights x 6 contents) and 5 objects without any content that only vary by 5 colors. Both robots perform 5 trials on each object, resulting in 7,600 interactions (2 robots x 8 behaviors x 95 objects x 5 trials

1 papers0 benchmarksActions, Audio, Images, Interactive, RGB Video, RGB-D, Time series, Videos

PreviousPage 41 of 51Next