29 machine learning datasets
29 dataset results
The Daimler Monocular Pedestrian Detection dataset is a dataset for pedestrian detection in urban environments. The training set contains 15560 pedestrian samples (image cut-outs at 48×96 resolution) and 6744 additional full images without pedestrians for extracting negative samples. The test set contains an independent sequence with more than 21790 images and 56492 pedestrian labels (fully visible or partially occluded), captured from a vehicle during a 27 min driving through the urban traffic.
The datasets includes curves drawn on 3D surfaces (triangle meshes) in Virtual Reality. A total of 2,880 curves were created using two different techniques by 20 users on 6 meshes. For each curve, a 3D curve executed by the user is provided, the projected curve created on the mesh, and the ground truth target curve on the mesh. For collecting the data, two different task types were employed, which are described in the paper.
A large-scale, first-of-its-kind database aimed at generating a better understanding of the way children interact with mobile devices during their development process. ChildCIdbv1 comprises data collected from 438 children, from 18 months to 8 years old, encompassing the first three development stages of Piaget's theory. Data collected spans interaction with screens using both finger and pen stylus, information regarding the previous experience of the child with mobile devices, the child’s grade level, and whether attention-deficit/hyperactivity disorder (ADHD) is present.
The Mafia Dataset was created to model the behavior of deceptive actors in the context of the Mafia game, as described in the paper “Putting the Con in Context: Identifying Deceptive Actors in the Game of Mafia”. We hope that this dataset will be of use to others studying the effects of deception on language use.
In this dataset two robots, Baxter and UR5, perform 8 behaviors (look, grasp, pick, hold, shake, lower, drop, and push) on 95 objects that vary by 5 color (blue, green, red, white, and yellow), 6 contents (wooden button, plastic dices, glass marbles, nuts & bolts, pasta, and rice), and 4 weights (empty, 50g, 100g, and 150g). There are 90 objects with contents (5 colors x 3 weights x 6 contents) and 5 objects without any content that only vary by 5 colors. Both robots perform 5 trials on each object, resulting in 7,600 interactions (2 robots x 8 behaviors x 95 objects x 5 trials
In this dataset UR5 robot used 6 tools: metal-scissor, metal-whisk, plastic-knife, plastic-spoon, wooden-chopstick, and wooden-fork to perform 6 behaviors: look, stirring-slow, stirring-fast, stirring-twist, whisk, and poke. The robot explored 15 objects: cane-sugar, chia-seed, chickpea, detergent, empty, glass-bead, kidney-bean, metal-nut-bolt, plastic-bead, salt, split-green-pea, styrofoam-bead, water, wheat, and wooden-button kept cylindrical containers. The robot performed 10 trials on each object using a tool, resulting in 5,400 interactions (6 tools x 6 behaviors x 15 objects x 10 trials). The robot records multiple sensory data (audio, RGB images, depth images, haptic, and touch images) while interacting with the objects.
In this dataset an uppertorso humanoid robot with 7-DOF arm explored 100 different objects belonging to 20 different categories using 10 behaviors: Look, Crush, Grasp, Hold, Lift, Drop, Poke, Push, Shake and Tap.
We use Something-Something v2 dataset to obtain the generation prompts and ground truth masks from real action videos. We filter out a set of 295 prompts. The details for this filtering are in the "Peekaboo: Interactive Video Generation via Masked-Diffusion" paper. We then use an off-the-shelf OWL-ViT-large open-vocabulary object detector to obtain the bounding box (bbox) annotations of the object in the videos. This set represents bbox and prompt pairs of real-world videos, serving as a test bed for both the quality and control of methods for generating realistic videos with spatio-temporal control.
We conducted a large crowdsourcing study of click patterns in an interactive segmentation scenario and collected 475K real-user clicks. Drawing on ideas from saliency tasks, we develop a clickability model that enables sampling clicks, which closely resemble actual user inputs. Using our model and dataset, we propose RClicks benchmark for a comprehensive comparison of existing interactive segmentation methods on realistic clicks. Specifically, we evaluate not only the average quality of methods, but also the robustness w.r.t. click patterns.