78 machine learning datasets
78 dataset results
We introduce a new dataset, Watch and Learn Time-lapse (WALT), consisting of multiple (4K and 1080p) cameras capturing urban environments over a year.
A new dataset with significant occlusions related to object manipulation.
300 Videos in the Wild (300-VW) is a dataset for evaluating facial landmark tracking algorithms in the wild. The dataset authors collected a large number of long facial videos recorded in the wild. Each video has duration of ~1 minute (at 25-30 fps). All frames have been annotated with regards to the same mark-up (i.e. set of facial landmarks) used in the 300 W competition as well (a total of 68 landmarks). The dataset includes 114 videos (circa 1 min each).
Multi-camera Multiple People Tracking (MMPTRACK) dataset has about 9.6 hours of videos, with over half a million frame-wise annotations. The dataset is densely annotated, e.g., per-frame bounding boxes and person identities are available, as well as camera calibration parameters. Our dataset is recorded with 15 frames per second (FPS) in five diverse and challenging environment settings., e.g., retail, lobby, industry, cafe, and office. This is by far the largest publicly available multi-camera multiple people tracking dataset.
The exiD dataset introduces a groundbreaking collection of naturalistic road user trajectories at highway entries and exits in Germany, meticulously captured with drones to navigate past the limitations of conventional traffic data collection methods, such as occlusions. This approach not only allows for the precise extraction of each road user’s trajectory and type but also ensures very high positional accuracy, thanks to sophisticated computer vision algorithms. Its innovative data collection technique minimizes errors and maximizes the quality and reliability of the dataset, making it a valuable resource for advanced research and development in the field of automated driving technologies.
IndustReal is an ego-centric, multi-modal dataset where 27 participants are challenged to perform assembly and maintenance procedures on a construction-toy car. The dataset is annotated for action recognition, assembly state detection, and procedure step recognition. IndustReal includes 38 execution errors in a total of 84 videos, with 14 exclusive to validation and test sets and therefore suitable for testing the robustness of algorithms against unseen errors in procedural tasks. IndustReal offers open-source 3D models for all parts to promote the use of synthetic data for scalable approaches on this dataset, as well as reproducibility. All assembly parts used in the dataset are 3D printed. This ensures reproducibility and future availability of the model and allows for growth via community effort.
aiMotive dataset is a multimodal dataset for robust autonomous driving with long-range perception. The dataset consists of 176 scenes with synchronized and calibrated LiDAR, camera, and radar sensors covering a 360-degree field of view. The collected data was captured in highway, urban, and suburban areas during daytime, night, and rain and is annotated with 3D bounding boxes with consistent identifiers across frames.
VOT2019 is a Visual Object Tracking benchmark for short-term tracking in RGB.
The eSports Sensors dataset contains sensor data collected from 10 players in 22 matches in League of Legends. The sensor data collected includes:
The UAVA,<i>UAV-Assistant</i>, dataset is specifically designed for fostering applications which consider UAVs and humans as cooperative agents. We employ a real-world 3D scanned dataset (<a href="https://niessner.github.io/Matterport/">Matterport3D</a>), physically-based rendering, a gamified simulator for realistic drone navigation trajectory collection, to generate realistic multimodal data both from the user’s exocentric view of the drone, as well as the drone’s egocentric view.
The dataset is designed specifically to solve a range of computer vision problems (2D-3D tracking, posture) faced by biologists while designing behavior studies with animals.
DivEMT, the first publicly available post-editing study of Neural Machine Translation (NMT) over a typologically diverse set of target languages. Using a strictly controlled setup, 18 professional translators were instructed to translate or post-edit the same set of English documents into Arabic, Dutch, Italian, Turkish, Ukrainian, and Vietnamese. During the process, their edits, keystrokes, editing times and pauses were recorded, enabling an in-depth, cross-lingual evaluation of NMT quality and post-editing effectiveness. Using this new dataset, we assess the impact of two state-of-the-art NMT systems, Google Translate and the multilingual mBART-50 model, on translation productivity.
One of the recent trends in vision problems is to use natural language captions to describe the objects of interest. This approach can overcome some limitations of traditional methods that rely on bounding boxes or category annotations. This paper introduces a novel paradigm for Multiple Object Tracking called Type-to-Track, which allows users to track objects in videos by typing natural language descriptions. We present a new dataset for that Grounded Multiple Object Tracking task, called GroOT, that contains videos with various types of objects and their corresponding textual captions of 256K words describing their appearance and action in detail. To cover a diverse range of scenes, GroOT was created using official videos and bounding box annotations from the MOT17, TAO and MOT20.
The SoccerNet Game State Reconstruction task is a novel high level computer vision task that is specific to sports analytics. It aims at recognizing the state of a sport game, i.e., identifying and localizing all sports individuals (players, referees, ..) on the field based on a raw input videos. SoccerNet-GSR is composed of 200 video sequences of 30 seconds, annotated with 9.37 million line points for pitch localization and camera calibration, as well as over 2.36 million athlete positions on the pitch with their respective role, team, and jersey number.
This dataset presents a vision and perception research dataset collected in Rome, featuring RGB data, 3D point clouds, IMU, and GPS data. We introduce a new benchmark targeting visual odometry and SLAM, to advance the research in autonomous robotics and computer vision. This work complements existing datasets by simultaneously addressing several issues, such as environment diversity, motion patterns, and sensor frequency. It uses up-to-date devices and presents effective procedures to accurately calibrate the intrinsic and extrinsic of the sensors while addressing temporal synchronization. During recording, we cover multi-floor buildings, gardens, urban and highway scenarios. Combining handheld and car-based data collections, our setup can simulate any robot (quadrupeds, quadrotors, autonomous vehicles). The dataset includes an accurate 6-dof ground truth based on a novel methodology that refines the RTK-GPS estimate with LiDAR point clouds through Bundle Adjustment. All sequences divi
The softwarised network data zoo (SNDZoo) is an open collection of software networking data sets aiming to streamline and ease machine learning research in the software networking domain. Most of the published data sets focus on, but are not limited to, the performance of virtualised network functions (VNFs). The data is collected using fully automated NFV benchmarking frameworks, such as tng-bench, developed by us or third party solutions like Gym. The collection of the presented data sets follows the general VNF benchmarking methodology described in.
The NBA SportVU dataset contains player and ball trajectories for 631 games from the 2015-2016 NBA season. The raw tracking data is in the JSON format, and each moment includes information about the identities of the players on the court, the identities of the teams, the period, the game clock, and the shot clock.
Existing image/video datasets for cattle behavior recognition are mostly small, lack well-defined labels, or are collected in unrealistic controlled environments. This limits the utility of machine learning (ML) models learned from them. Therefore, we introduce a new dataset, called Cattle Visual Behaviors (CVB), that consists of 502 video clips, each fifteen seconds long, captured in natural lighting conditions, and annotated with eleven visually perceptible behaviors of grazing cattle. By creating and sharing CVB, our aim is to develop improved models capable of recognizing all important cattle behaviors accurately and to assist other researchers and practitioners in developing and evaluating new ML models for cattle behavior classification using video data. The dataset is presented in the form of following three sub-directories. 1. raw_frames: contains 450 frames in each sub folder representing a 15 second video taken at a frame rate of 30 FPS. 2. annotations: contains the json file
Data used for the paper SparsePoser: Real-time Full-body Motion Reconstruction from Sparse Data
This dataset comprehends the 3D building information model (in IFC and Revit formats), manually elaborated based on the terrestrial laser scanner of the sequence 2 of ConSLAM, and the refined ground truth (GT) poses (in TUM format) of sessions 2, 3, 4, and 5 of the open-access ConSLAM dataset (which provides camera, LiDAR, and IMU measurements).