3,275 machine learning datasets
3,275 dataset results
BCCD is a small-scale dataset for blood cells detection.
MultiSense is a dataset of 9,504 images annotated with an English verb and its translation in Spanish and German.
STAIR Captions is a large-scale dataset containing 820,310 Japanese captions. This dataset can be used for caption generation, multimodal retrieval, and image generation.
The 3DNet dataset is a free resource for object class recognition and 6DOF pose estimation from point cloud data. 3DNet provides a large-scale hierarchical CAD-model databases with increasing numbers of classes and difficulty with 10, 60 and 200 object classes together with evaluation datasets that contain thousands of scenes captured with an RGB-D sensor.
CSAW-S is a dataset of mammography images which includes expert annotations of tumors and non-expert annotations of breast anatomy and artifacts in the image.
The FIGR-8 database is a dataset containing 17,375 classes of 1,548,256 images representing pictograms, ideograms, icons, emoticons or object or conception depictions. Its aim is to set a benchmark for Few-shot Image Generation tasks, albeit not being limited to it. Each image is represented by 192x192 pixels with grayscale value of 0-255. Classes are not balanced (they do not all contain the same number of elements), but they all do contain at the very least 8 images.
EDEN (Enclosed garDEN) is a multimodal synthetic dataset, a dataset for nature-oriented applications. The dataset features more than 300K images captured from more than 100 garden models. Each image is annotated with various low/high-level vision modalities, including semantic segmentation, depth, surface normals, intrinsic colors, and optical flow.
Provides a large-scale synthetic dataset which contains accurate ground truth depth of various photo-realistic scenes.
AeroRIT is a hyperspectral dataset to facilitate aerial hyperspectral scene understanding.
This self-driving dataset collected in Brno, Czech Republic contains data from four WUXGA cameras, two 3D LiDARs, inertial measurement unit, infrared camera and especially differential RTK GNSS receiver with centimetre accuracy.
A large-scale anime image database with 4.2m+ images annotated with 130m+ text tags describing image contents in detail; it can be useful for machine learning purposes such as image recognition and generation. It has been applied to a wide variety of applications, particularly generative modeling.
The DDI-100 dataset is a synthetic dataset for text detection and recognition based on 7000 real unique document pages and consists of more than 100000 augmented images. The ground truth comprises text and stamp masks, text and characters bounding boxes with relevant annotations.
This dataset provides a large number of training and testing example which is sufficient for a deep learning approach to address Dunhuang Grotto Painting restoration.
Egocentric Dataset of the University of Barcelona – Segmentation (EDUB-Seg) is a dataset for egocentric event segmentation acquired by the Narrative Clip, which takes a picture every 30 seconds. The dataset contains a total of 18,735 images captured by 7 different users during overall 20 days. To ensure diversity, all users were wearing the camera in different contexts: while attending a conference, on holiday, during the weekend, and during the week.
48k question-answer pairs written in rich natural language.
The endoscopic SLAM dataset (EndoSLAM) is a dataset for depth estimation approach for endoscopic videos. It consists of both ex-vivo and synthetically generated data. The ex-vivo part of the dataset includes standard as well as capsule endoscopy recordings. The dataset is divided into 35 sub-datasets. Specifically, 18, 5 and 12 sub-datasets exist for colon, small intestine and stomach respectively.
ESAD is a large-scale dataset designed to tackle the problem of surgeon action detection in endoscopic minimally invasive surgery. ESAD aims at contributing to increase the effectiveness and reliability of surgical assistant robots by realistically testing their awareness of the actions performed by a surgeon. The dataset provides bounding box annotation for 21 action classes on real endoscopic video frames captured during prostatectomy, and was used as the basis of a recent MIDL 2020 challenge.
Propose a dataset which adopts multi-channel visual input.
Kitchen Scenes is a multi-view RGB-D dataset of nine kitchen scenes, each containing several objects in realistic cluttered environments including a subset of objects from the BigBird dataset. The viewpoints of the scenes are densely sampled and objects in the scenes are annotated with bounding boxes and in the 3D point cloud.
MJU-Waste is an RGBD waste object segmentation dataset that is made public to facilitate future research in this area.