3,275 machine learning datasets
3,275 dataset results
This dataset is used for predicting house prices from both images and textual information. It is composed of 535 sample houses from California, USA.
Image Caption Quality Dataset is a dataset of crowdsourced ratings for machine-generated image captions. It contains more than 600k ratings of image-caption pairs.
The iNaturalist Fine-Grained Geolocation dataset is an extension of the iNaturalist dataset with complementary geolocation information.
Involves data where a robot interacts with 5.1 cm colored blocks to complete an order-fulfillment style block stacking task. It contains dynamic scenes and real time-series data in a less constrained environment than comparable datasets. There are nearly 12,000 stacking attempts and over 2 million frames of real data.
Explicitly created for Human Computer Interaction (HCI).
Dataset containing 9372 RGB images of weeds with the number of leaves counted. The images are collected in fields across Denmark using Nokia and Samsung cell phone cameras; Samsung, Nikon, Canon and Sony consumer cameras; and a Point Grey industrial camera.
The LEMMA dataset aims to explore the essence of complex human activities in a goal-directed, multi-agent, multi-task setting with ground-truth labels of compositional atomic-actions and their associated tasks. By quantifying the scenarios to up to two multi-step tasks with two agents, the authors strive to address human multi-task and multi-agent interactions in four scenarios: single-agent single-task (1 x 1), single-agent multi-task (1 x 2), multi-agent single-task (2 x 1), and multi-agent multi-task (2 x 2). Task instructions are only given to one agent in the 2 x 1 setting to resemble the robot-helping scenario, hoping that the learned perception models could be applied in robotic tasks (especially in HRI) in the near future.
LISA Gaze is a dataset for driver gaze estimation comprising of 11 long drives, driven by 10 subjects in two different cars.
Consists of a large number of unconstrained multi-view and partially occluded faces.
METU-ALET is an image dataset for the detection of the tools in the wild. The dataset has annotations for tools that belongs to the categories such as farming, gardening, office, stonemasonry, vehicle, woodworking and workshop. The images in the dataset contains a total of 22,841 bounding boxes and 49 different tool categories.
The Multiple Light Source dataset (MLS) is a collection of 24 multiple object scenes each recorded under 18 multiple light source illumination scenarios. The illuminants are varying in dominant spectral colours, intensity and distance from the scene. The dataset can be used for the evaluation of computational colour constancy algorithms. Along with the images of the scenes the spectral characteristics of the camera, light sources and the objects are also provided, and each image includes pixel-by-pixel ground truth annotation of uniformly coloured object surfaces thus making this useful for benchmarking colour-based image segmentation algorithms.
MNIST-MIX is a multi-language handwritten digit recognition dataset. It contains digits from 10 different languages.
NavigationNet is a computer vision dataset and benchmark to allow the utilization of deep reinforcement learning on scene-understanding-based indoor navigation.
The PSU Near-Regular Texture Database is a texture dataset. It covers the spectrum of textures from completely regular to near-regular to irregular. It also includes video of near-regular textures in motion. The database also contains, or will include, test image sets with ground-truth for translation, rotation, reflection/glide-reflection symmetry detection algorithms.
A dataset to encourage research in these environments. It consists of labeled stereo video of people in orange and apple orchards taken from two perception platforms (a tractor and a pickup truck), along with vehicle position data from RTK GPS.
Open MIC (Open Museum Identification Challenge) contains photos of exhibits captured in 10 distinct exhibition spaces of several museums which showcase paintings, timepieces, sculptures, glassware, relics, science exhibits, natural history pieces, ceramics, pottery, tools and indigenous crafts. The goal of Open MIC is to stimulate research in domain adaptation, egocentric recognition and few-shot learning by providing a testbed complementary to the famous Office 31.
The pic2kal benchmark for calorie prediction contains 308,000 images from over 70,000 recipes including photographs, ingredients and instructions, matched with nutritional information.
The Pinterest Complete the Look dataset consists of over 1 million outfits and 4 million objects. It can be used to predict style compatibility between fashion items in order to recommend complementary items that complete an outfit.
This dataset contains nine video sequences captured by a webcam for salient closed boundary tracking evaluation. Each sequence is about 30 sec (30 fps) and the frame size is 640×480 (width×height). There are 9598 frames in total. In each sequence, different motion styles such as translation, rotation and viewpoint changing are all performed.
Salient-KITTI is a saliency map prediction dataset based on KITTI.