87 machine learning datasets
87 dataset results
The field of biomechanics is at a turning point, with marker-based motion capture set to be replaced by portable and inexpensive hardware, rapidly improving markerless tracking algorithms, and open datasets that will turn these new technologies into field-wide team projects. To expedite progress in this direction, we have collected the CMU Panoptic Dataset 2.0, which contains 86 subjects captured with 140 VGA cameras, 31 HD cameras, and 15 IMUs, performing on average 6.5 min of activities, including range of motion activities and tasks of daily living.
In this dataset UR5 robot used 6 tools: metal-scissor, metal-whisk, plastic-knife, plastic-spoon, wooden-chopstick, and wooden-fork to perform 6 behaviors: look, stirring-slow, stirring-fast, stirring-twist, whisk, and poke. The robot explored 15 objects: cane-sugar, chia-seed, chickpea, detergent, empty, glass-bead, kidney-bean, metal-nut-bolt, plastic-bead, salt, split-green-pea, styrofoam-bead, water, wheat, and wooden-button kept cylindrical containers. The robot performed 10 trials on each object using a tool, resulting in 5,400 interactions (6 tools x 6 behaviors x 15 objects x 10 trials). The robot records multiple sensory data (audio, RGB images, depth images, haptic, and touch images) while interacting with the objects.
In this dataset an uppertorso humanoid robot with 7-DOF arm explored 100 different objects belonging to 20 different categories using 10 behaviors: Look, Crush, Grasp, Hold, Lift, Drop, Poke, Push, Shake and Tap.
Robot@Home2, is an enhanced version aimed at improving usability and functionality for developing and testing mobile robotics and computer vision algorithms. Robot@Home2 consists of three main components. Firstly, a relational database that states the contextual information and data links, compatible with Standard Query Language. Secondly,a Python package for managing the database, including downloading, querying, and interfacing functions. Finally, learning resources in the form of Jupyter notebooks, runnable locally or on the Google Colab platform, enabling users to explore the dataset without local installations. These freely available tools are expected to enhance the ease of exploiting the Robot@Home dataset and accelerate research in computer vision and robotics.
InfraParis is a novel and versatile dataset supporting multiple tasks across three modalities: RGB, depth, and infrared. From the city to the suburbs, it contains a variety of styles in different areas of the greater Paris area, providing rich semantic information. InfraParis contains 7301 images with bounding boxes and full semantic (19 classes) annotations. We assess various state-of-the-art baseline techniques, encompassing models for the tasks of semantic segmentation, object detection, and depth estimation.
LSA-T is the first continuous Argentinian Sign Language (LSA) dataset. It contains 14,880 sentence level videos of LSA extracted from the CN Sordos YouTube channel with labels and keypoints annotations for each signer. Videos are in 30 FPS full HD (1920x1080).
Understanding comprehensive assembly knowledge from videos is critical for futuristic ultra-intelligent industry. To enable technological breakthrough, we present HA-ViD – an assembly video dataset that features representative industrial assembly scenarios, natural procedural knowledge acquisition process, and consistent human-robot shared annotations. Specifically, HA-ViD captures diverse collaboration patterns of real-world assembly, natural human behaviors and learning progression during assembly, and granulate action annotations to subject, action verb, manipulated object, target object, and tool. We provide 3222 multi-view and multi-modality videos, 1.5M frames, 96K temporal labels and 2M spatial labels. We benchmark four foundational video understanding tasks: action recognition, action segmentation, object detection and multi-object tracking. Importantly, we analyze their performance and the further reasoning steps for comprehending knowledge in assembly progress, process effici
FreeMan is the first large-scale multi-view human motion dataset under real scenarios. FreeMan was captured by synchro- nizing 8 smartphones across diverse scenarios. It comprises 11M frames from 8000 sequences, viewed from different perspectives. These sequences cover 40 subjects across 10 different scenarios, each with varying lighting conditions.
The sports industry is witnessing an increasing trend of utilizing multiple synchronized sensors for player data collection, enabling personalized training systems with multi-perspective real-time feedback. Badminton could benefit from these various sensors, but there is a scarcity of comprehensive badminton action datasets for analysis and training feedback. Addressing this gap, this paper introduces a multi-sensor badminton dataset for forehand clear and backhand drive strokes, based on interviews with coaches for optimal usability. The dataset covers various skill levels, including beginners, intermediates, and experts, providing resources for understanding biomechanics across skill levels. It encompasses 7,763 badminton swing data from 25 players, featuring sensor data on eye tracking, body tracking, muscle signals, and foot pressure. The dataset also includes video recordings, detailed annotations on stroke type, skill level, sound, ball landing, and hitting location, as well as s
The DADE dataset, short for Driving Agents in Dynamic Environments, is a synthetic dataset designed for the training and evaluation of methods for the task of semantic segmentation in the context of autonomous driving agents navigating dynamic environments and weather conditions.
ConSLAM is a real-world dataset collected periodically on a construction site to measure the accuracy of mobile scanners' SLAM algorithms.
The YCB-Ev dataset contains synchronized RGB-D frames and event data that enables evaluating 6DoF object pose estimation algorithms using these modalities. This dataset provides ground truth 6DoF object poses for the same 21 YCB objects that were used in the YCB-Video (YCB-V) dataset, allowing for cross-dataset algorithm performance evaluation. The dataset consists of 21 synchronized event and RGB-D sequences, totalling 13,851 frames (7 minutes and 43 seconds of event data). Notably, 12 of these sequences feature the same object arrangement as the YCB-V subset used in the BOP challenge.
We introduce a video dataset Bukva for Russian Dactyl Recognition task. Bukva dataset size is about 27 GB, and it contains 3757 RGB videos with more than 101 samples for each RSL alphabet sign, including dynamic ones. The dataset is divided into training set and test set by subject user_id. The training set includes 3097 videos, and the test set includes 660 videos. The total video recording time is ~4 hours. About 17% of the videos are recorded in HD format, and 70% of the videos are in FullHD resolution.
CausalChaos! is a dataset for causal video question answering. It is based on Tom and Jerry cartoons. It features longer causal chains embedded in dynamic visual scenes. It also features challenging incorrect options, especially, Causal Confusion set which contains causally confounding incorrect options. All these factors prove to be challenging for current VLMs and other traditional Video Question Answering models.
WiFiCam dataset for through-wall imaging based on WiFi channel state information. The corresponding source code repository is located at: https://github.com/StrohmayerJ/wificam
The Azerbaijani Sign Language Dataset (AzSLD) is a comprehensive, large dataset designed to facilitate the development and evaluation of machine learning models for the recognition and translation of Azerbaijani Sign Language (AzSL).
The largest video inpainting dataset comprises over 390K clips (> 866.7 hours), featuring precise masks and detailed video captions.
The benchmark for VPData, the largest video inpainting dataset, which comprises over 390K clips (> 866.7 hours) and features precise masks and detailed video captions.
VETRA is a dataset for vehicle tracking in aerial image sequences and presents unique challenges such as low frame rates, small and fast-moving objects, as well as high camera movement. These characteristics allow for extended tracking of numerous vehicles with varying motion behaviors over large areas and pose new challenges for MOT algorithms. VETRA consists of 52 image sequences captured by airplanes and helicopters using DLR’s 3k and 4k camera systems. The acquisition sites are located in Germany and Austria. In addition to the classical training, validation and test sets, VETRA offers a second test set specifically designed for the application of large area monitoring (LAM). The LAM sequences are recorded over 7 rural roads and motorways with a fixed camera speed and configuration. Each road section is captured at 4 different times of the day, enabling the performance of MOT algorithms to be evaluated under different traffic loads in a static environment. Furthermore, the feature
InfiniteRep is a synthetic, open-source dataset for fitness and physical therapy (PT) applications. It includes 1k videos of diverse avatars performing multiple repetitions of common exercises. It includes significant variation in the environment, lighting conditions, avatar demographics, and movement trajectories. From cadence to kinematic trajectory, each rep is done slightly differently -- just like real humans. InfiniteRep videos are accompanied by a rich set of pixel-perfect labels and annotations, including frame-specific repetition counts.