3,275 machine learning datasets
3,275 dataset results
The BTAD ( beanTech Anomaly Detection) dataset is a real-world industrial anomaly dataset. The dataset contains a total of 2830 real-world images of 3 industrial products showcasing body and surface defects.
Composed Image Retrieval (or, Image Retreival conditioned on Language Feedback) is a relatively new retrieval task, where an input query consists of an image and short textual description of how to modify the image.
CHASE_DB1 is a dataset for retinal vessel segmentation which contains 28 color retina images with the size of 999×960 pixels which are collected from both left and right eyes of 14 school children. Each image is annotated by two independent human experts.
Set11 is a dataset of 11 grayscale images. It is a dataset used for image reconstruction and image compression.
The Middlebury 2014 dataset contains a set of 23 high resolution stereo pairs for which known camera calibration parameters and ground truth disparity maps obtained with a structured light scanner are available. The images in the Middlebury dataset all show static indoor scenes with varying difficulties including repetitive structures, occlusions, wiry objects as well as untextured areas.
CASIA-MFSD is a dataset for face anti-spoofing. It contains 50 subjects, and 12 videos for each subject under different resolutions and light conditions. Three different spoof attacks are designed: replay, warp print and cut print attacks. The database contains 600 video recordings, in which 240 videos of 20 subjects are used for training and 360 videos of 30 subjects for testing.
ETH is a dataset for pedestrian detection. The testing set contains 1,804 images in three video clips. The dataset is captured from a stereo rig mounted on car, with a resolution of 640 x 480 (bayered), and a framerate of 13--14 FPS.
The Exclusively Dark (ExDARK) dataset is a collection of 7,363 low-light images from very low-light environments to twilight (i.e 10 different conditions) with 12 object classes (similar to PASCAL VOC) annotated on both image class level and local object bounding boxes.
We introduce a dataset of 147 object categories containing over 6000 images that are suitable for the few-shot counting task. We collected and annotated images ourselves. Our dataset consists of 6135 images across a di- verse set of 147 object categories, from kitchen utensils and office stationery to vehicles and animals. The object count in our dataset varies widely, from 7 to 3731 objects, with an average count of 56 objects per image. In each image, each object instance is annotated with a dot at its approxi- mate center. In addition, three object instances are selected randomly as exemplar instances; these exemplars are also annotated with axis-aligned bounding boxes.
SQA3D is a dataset for embodied scene understanding, where an agent needs to comprehend the scene it situates from an first person's perspective and answer questions. The questions are designed to be situated, embodied and knowledge-intensive. We offer three different modalities to represent a 3D scene: 3D scan, egocentric video and BEV picture.
The Stanford Dogs dataset contains 20,580 images of 120 classes of dogs from around the world, which are divided into 12,000 images for training and 8,580 images for testing.
Composition-1K is a large-scale image matting dataset including 49300 training images and 1000 testing images.
Includes 4000 images; 200 from each of 20 categories covering different types of scenes such as Cartoons, Art, Objects, Low resolution images, Indoor, Outdoor, Jumbled, Random, and Line drawings.
The EYEDIAP dataset is a dataset for gaze estimation from remote RGB, and RGB-D (standard vision and depth), cameras. The recording methodology was designed by systematically including, and isolating, most of the variables which affect the remote gaze estimation algorithms:
The TotalCapture dataset consists of 5 subjects performing several activities such as walking, acting, a range of motion sequence (ROM) and freestyle motions, which are recorded using 8 calibrated, static HD RGB cameras and 13 IMUs attached to head, sternum, waist, upper arms, lower arms, upper legs, lower legs and feet, however the IMU data is not required for our experiments. The dataset has publicly released foreground mattes and RGB images. Ground-truth poses are obtained using a marker-based motion capture system, with the markers are <5mm in size. All data is synchronised and operates at a framerate of 60Hz, providing ground truth poses as joint positions.
The Wireframe dataset consists of 5,462 images (5,000 for training, 462 for test) of indoor and outdoor man-made scenes.
PGM dataset serves as a tool for studying both abstract reasoning and generalisation in models. Generalisation is a multi-faceted phenomenon; there is no single, objective way in which models can or should generalise beyond their experience. The PGM dataset provides a means to measure the generalization ability of models in different ways, each of which may be more or less interesting to researchers depending on their intended training setup and applications.
RELLIS-3D is a multi-modal dataset for off-road robotics. It was collected in an off-road environment containing annotations for 13,556 LiDAR scans and 6,235 images. The data was collected on the Rellis Campus of Texas A&M University and presents challenges to existing algorithms related to class imbalance and environmental topography. The dataset also provides full-stack sensor data in ROS bag format, including RGB camera images, LiDAR point clouds, a pair of stereo images, high-precision GPS measurement, and IMU data.
Dark Zurich is an image dataset containing a total of 8779 images captured at nighttime, twilight, and daytime, along with the respective GPS coordinates of the camera for each image. These GPS annotations are used to construct cross-time-of-day correspondences, i.e., to match each nighttime or twilight image to its daytime counterpart.
PA-100K is a recent-proposed large pedestrian attribute dataset, with 100,000 images in total collected from outdoor surveillance cameras. It is split into 80,000 images for the training set, and 10,000 for the validation set and 10,000 for the test set. This dataset is labeled by 26 binary attributes. The common features existing in both selected dataset is that the images are blurry due to the relatively low resolution and the positive ratio of each binary attribute is low.