3,275 machine learning datasets
3,275 dataset results
Authors of the Dataset:
CAT is a specialized dataset for co-saliency detection - one of the core tasks in the field of computer vision. This dataset is intended for both helping to assess the performance of vision algorithms and supporting research that aims to exploit large volumes of annotated data, e.g., for training deep neural networks. CAT consists of 33,500 images
VQA-MHUG is a 49-participant dataset of multimodal human gaze on both images and questions during visual question answering (VQA) collected using a high-speed eye tracker.
The Lincolnbeet dataset is an object detection dataset designed to encourage research in the identification of items in environments with high levels of occlusion, and in the development of better approaches to evaluate object detection models in practical scenarios. This dataset was introduced in the paper: "Towards practical object detection for weed spraying in precision agriculture".
Risholme-2021 contains >3.5K images of strawberries at various growth stages along with anomalous instances. Data collection was performed in the strawberry research farm at the Riseholme campus of the University of Lincoln in UK. For more details, please check out "Homepage" down below.
CCIHP dataset is devoted to fine-grained description of people in the wild with localized & characterized semantic attributes. It contains 20 attribute classes and 20 characteristic classes split into 3 categories (size, pattern and color). The dataset has been introduced in this paper: Loesch, A., & Audigier, R. (2021, September). Describe me if you can! Characterized instance-level human parsing. In 2021 IEEE International Conference on Image Processing (ICIP) (pp. 2528-2532). IEEE. The annotations were made with Pixano, an opensource, smart annotation tool for computer vision applications: https://pixano.cea.fr/
Food Drinks and groceries Images Multi Lingual (FooDI-ML) is a dataset that contains over 1.5M unique images and over 9.5M store names, product names descriptions, and collection sections gathered from the Glovo application. The data made available corresponds to food, drinks and groceries products from 37 countries in Europe, the Middle East, Africa and Latin America. The dataset comprehends 33 languages, including 870K samples of languages of countries from Eastern Europe and Western Asia such as Ukrainian and Kazakh, which have been so far underrepresented in publicly available visiolinguistic datasets. The dataset also includes widely spoken languages such as Spanish and English.
This dataset contains over 400,000 images (illustrations) from Niconico Seiga and Niconico Shunga
This dataset consists of more than 16,000 retinal OCT B-scans from 441 cases (Normal: 120, Drusen: 160, CNV: 161) and is acquired at Noor Eye Hospital, Tehran, Iran. Images are labeled by a retinal specialist.
Rogue Wave Dataset-10K dataset consists of 10191 rogue wave images.
A simple dataset consisting of three geometric shapes (Triangle, Rectangle, Ellipsoid) of similar sizes but different orientations.
MUNO21 is a large-scale and comprehensive dataset for the map update task. It includes time series of aerial images and map data to capture the evolution of both the physical road network and real street maps over time -- we collect NAIP aerial images at each of four years over the eight-year timespan from 2012–2019, and OSM extracts from each year during the same timespan.
This ImageNet version contains only 50 training images per class while the original testing set remains unchanged. It is one of the datasets comprising the data-efficient image classification (DEIC) benchmark. It was proposed to challenge the generalization capabilities of modern image classifiers.
Object detection dataset featuring people walking on grass captured aboard a UAV. This data sets include precise meta data information about altitude, viewing angle and others.
NYU-VPR is a dataset for Visual place recognition (VPR) that contains more than 200,000 images over a 2km×2km area near the New York University campus, taken within the whole year of 2016.
TMBuD is a dataset for building recognition and 3D reconstruction of human made structures in urban scenarios. The dataset features 160 images of buildings from Timişoara, Romania, with a resolution of 768 x 1024 pixels each. The proposed dataset will allow proper evaluation of salient edges and semantic segmentation of images focusing on the street view perspective
The dataset contains aerial agricultural images of a potato field with manual labels of healthy and stressed plant regions. The images were collected with a Parrot Sequoia multispectral camera carried by a 3DR Solo drone flying at an altitude of 3 meters. The dataset consists of RGB images with a resolution of 750×750 pixels, and spectral monochrome red, green, red-edge, and near-infrared images with a resolution of 416×416 pixels, and XML files with annotated bounding boxes of healthy and stressed potato crop.
A synthetic dataset containing 447 typefaces with only one font variation for each typeface, created for visual font recognition.
A synthetic dataset containing word images of 447 typefaces with font variations for each typeface, created for visual font recognition.
BPCIS is collection of 364 bacterial phase contrast images and corresponding label matrices for instance segmentation. Labels were made according to fluorescence channels where possible. Prior to manual annotation, images were automatically cropped into microcolonies and tiled into ensemble images to reduce the empty (non-cell) image regions for training and testing. Subsequent to annotation, we performed non-rigid registration of phase contrast to cell masks.