3,275 machine learning datasets
3,275 dataset results
The Freiburg Poking dataset is a dataset for learning intuitive physics from physical interaction. It consists of 40K of interaction data with a KUKA LBR iiwa manipulator and a fixed Azure Kinect RGB-D camera. The dataset creators built an arena of styrofoam with walls for preventing objects from falling down. At any given time there were 3-7 objects randomly chosen from a set of 34 distinct objects present on the arena. The objects differed from each other in shape, appearance, material, mass and friction.
The Parzival dataset consists of 47 pages by three writers. These pages were taken from a medieval German manuscript from the 13th century that contains the epic poem Parzival by Wolfram von Eschenbach. The image size is 2000 x 3000 pixels. 24 pages are selected as training set; 14 pages are selected as test set; 2 pages are selected as validation set.
The Freiburg RGB-D People dataset contains 3000+ RGB-D frames acquired in a university hall from three vertically mounted Kinect sensors. The data contains mostly upright walking and standing persons seen from different orientations and with different levels of occlusions.
PAVIS RGB-D is a dataset for person re-identification using depth information. The main motivation is that techniques such as SDALF fail when the individuals change their clothing, therefore they cannot be used for long-term video surveillance. Depth information is the solution to deal with this problem because it stays constant for a longer period of time. The dataset is composed by four different groups of data collected using the Kinect. The first group of data has been obtained by recording 79 people with a frontal view, walking slowly, avoiding occlusions and with stretched arms ("Collaborative"). This happened in an indoor scenario, where the people were at least 2 meters away from the camera. The second ("Walking1") and third ("Walking2") groups of data are composed by frontal recordings of the same 79 people walking normally while entering the lab where they normally work. The fourth group ("Backwards") is a back view recording of the people walking away from the lab. The data
KTI Multiview Football I is a dataset of football players with annotated joints that can be used for multi-view reconstruction. The dataset includes 771 images of football players, images taken from 3 views at 257 time instances, and 14 annotated body joints.
Sugar Beets 2016 is a robot dataset for plant classification as well as localization and mapping that covers the relevant stages for robotic intervention and weed control. It contains around 5TB of data recorded from a robot with a 4-channel multi-spectral camera and a RGB-D sensor to capture detailed information about the plantation.
USYD CAMPUS is a driving dataset collected by Zhou et al at the University of Sydney (USyd) campus and surroundings. This USYD Campus Dataset contains more than 60 weeks of drives and is continuously updated. It includes multiple sensor modalities (camera, lidar, GPS, IMU, wheel encoder, steering angle, etc.) and covers various environmental conditions as well as diverse changes to illumination, scene structure, and pedestrian/vehicle traffic volumes.
NeuB1 is a microscopic neuronal image dataset for retinal vessel segmentation, which contains 112 images of size 512 x 152. The train/test split is 37/75. Image Source: https://web.bii.a-star.edu.sg/~zhaoh/Jaydeep_Tracing/
The Fabrics Dataset consists of about 2000 samples of garments and fabrics. A small patch of each surface has been captured under 4 different illumination conditions using a custom made, portable photometric stereo sensor. All images have been acquired "in the field" (at clothes shops) and the dataset reflects the distribution of fabrics in real world, hence it is not balanced. The majority of clothes are made of specific fabrics, such as cotton and polyester, while some other fabrics, such as silk and linen, are more rare. Also, a large number of clothes are not composed of a single fabric but two or more fabrics are used to give the garment the desired properties (blended fabrics). For every garment there is information (attributes) about its material composition from the manufacturer label and its type (pants, shirt, skirt etc.).
The Large Scale Facial Model (LSFM) is a 3D statistical model of facial shape built from nearly 10,000 individuals.
The DR HAGIS database has been created to aid the development of vessel extraction algorithms suitable for retinal screening programmes. Researchers are encouraged to test their segmentation algorithms using this database.
The VICAVR database is a set of retinal images used for the computation of the A/V Ratio. The database currently includes 58 images. The images have been acquired with a TopCon non-mydriatic camera NW-100 model and are optic disc centered with a resolution of 768x584. The database includes the caliber of the vessels measured at different radii from the optic disc as well as the vessel type (artery/vein) labelled by three experts.
The database consists of 89 colour fundus images of which 84 contain at least mild non-proliferative signs (Microaneurysms) of the diabetic retinopathy, and 5 are considered as normal which do not contain any signs of the diabetic retinopathy according to all experts who participated in the evaluation. Images were captured using the same 50 degree field-of-view digital fundus camera with varying imaging settings. The data correspond to a good (not necessarily typical) practical situation, where the images are comparable, and can be used to evaluate the general performance of diagnostic methods. This data set is referred to as "calibration level 1 fundus images".
The LfED-6D dataset is a collection of 6D grasp annotations acquired through experience (with a robot platform) or by human demonstration. For known objects, the annotated grasps can be directly applied given the pose of the object model is correctly computed. For unknown objects, the grasps can be generalized using methods for shape matching, for example the Dense Geometrical Correspondence Network.
Contain Arabic handwritten digits images (60000 training and 10000 testing images).
The first public dataset dedicated for Latin (French) and Arabic Scene Text Detection in Highway panels. It comprises more than 1800 well-annotated images. The dataset was colleted from Moroccan Highway and it has been manually annotated. ASAYAR data can be used to develop and evaluate traffic signs detection and French or Arabic text detection in different languages.
This dataset contains 13427 camera images at a resolution of 1280x720 pixels and contains about 24000 annotated traffic lights. The annotations include bounding boxes of traffic lights as well as the current state (active light) of each traffic light. The camera images are provided as raw 12bit HDR images taken with a red-clear-clear-blue filter and as reconstructed 8-bit RGB color images. The RGB images are provided for debugging and can also be used for training. However, the RGB conversion process has some drawbacks. Some of the converted images may contain artifacts and the color distribution may seem unusual.
This dataset is being constructed specifically to support research on techniques that bridge the gap between 2D, appearance-based recognition techniques, and fully 3D approaches. It is designed to simulate, in a controlled fashion, realistic surveillance conditions and to probe the efficacy of exploiting 3D models in real scenarios.
A database of images with measured probabilities that each picture will be remembered after a single view.
The Dataset consists of the multimodal facial images of 52 people (14 females, 38 males) obtained by Kinect. The data is captured in two sessions happened at different time period (about half month). In each session, the dataset provides the facial images of each person in 9 states of different facial expressions, different lighting and occlusion conditions: neutral, smile, open mouth, left profile, right profile, occlusion eyes, occlusion mouth, occlusion paper and light on [Figure 1]. All the images are provided in three sources of information: the RGB color image, the depth map (provided in two forms of the bitmap depth image and the text file containing the original depth levels sensed by Kinect) as well as 3D. In addition, the dataset comes with the manual landmarks of 6 positions in the face: left eye, right eye, the tip of nose, left side of mouth, right side of mouth and the chin [Figure 2]. Other information of the person such as gender, year of birth, glasses (this person wea