3,275 machine learning datasets
3,275 dataset results
We introduce GLAMI-1M: the largest multilingual image-text classification dataset and benchmark. The dataset contains images of fashion products with item descriptions, each in 1 of 13 languages. Categorization into 191 classes has high-quality annotations: all 100k images in the test set and 75% of the 1M training set were human-labeled. The paper presents baselines for image-text classification showing that the dataset presents a challenging fine-grained classification problem: The best scoring EmbraceNet model using both visual and textual features achieves 69.7% accuracy. Experiments with a modified Imagen model show the dataset is also suitable for image generation conditioned on text.
SeaDronesSee-Object Detection v2 (S-ODv2) dataset contains 14,227 RGB images (training: 8,930; validation: 1,547; testing: 3,750). The images are captured from various altitudes and viewing angles ranging from 5 to 260 meters and 0 to 90° degrees (gimbal pitch angle) while providing the respective meta information for altitude, viewing angle and other meta data for almost all frames.
The Apron Dataset focuses on training and evaluating classification and detection models for airport-apron logistics. In addition to bounding boxes and object categories the dataset is enriched with meta parameters to quantify the models’ robustness against environmental influences.
One-Shot Affordance Part Segmentation variant of the UMD dataset. Each object instance in the dataset contains a single image.
MatSim is a synthetic dataset, and natural image benchmark for computer vision-based recognition of similarities and transitions between materials and textures, focusing on identifying any material under any conditions using one or a few examples (one-shot learning), including materials states and subclasses.
EBHI-Seg is a dataset containing 5,170 images of six types of tumor differentiation stages and the corresponding ground truth images. The dataset can provide researchers with new segmentation algorithms for medical diagnosis of colorectal cancer.
SolarDK is a dataset for the detection and localization of solar. It comprises images from GeoDanmark with a variable Ground Sample Distance (GSD) between 10 cm and 15 cm, all sampled between March 1st and May 1st during 2021, containing 23,417 hand labelled images for classification and 880 segmentation masks, in addition to a set of about 100,000+ images for classification covering most variations of Danish urban and rural landscapes.
CLEVR Mental Rotation Tests (CLEVR-MRT) is a new version of the CLEVR dataset. It contains 20 images generated for each scene holding a constant altitude and sampling over azimuthal angle. It is a controlled setting whereby questions are posed about the properties of a scene if that scene was observed from another viewpoint.
Computer Vision Arxiv Figures dataset consists of 88,645 images that more closely resemble the structure of our visual prompts. The dataset was collected from Arxiv, the open-access web archive for scholarly articles from a variety of academic fields.
The LIB-HSI dataset contains hyperspectral reflectance images and their corresponding RGB images of building façades in a light industrial environment. The dataset also contains pixel-level annotated images for each hyperspectral/RGB image. The LIB-HSI dataset was created to develop deep learning methods for segmenting building facade materials.
DialogCC is a large-scale multi-modal dialogue dataset, which covers diverse real-world topics and various images per dialogue. It contains 651k unique images and is designed for image and text retrieval tasks.
REAP is a digital benchmark that allows the user to evaluate patch attacks on real images, and under real-world conditions. Built on top of the Mapillary Vistas dataset, the benchmark contains over 14,000 traffic signs. Each sign is augmented with a pair of geometric and lighting transformations, which can be used to apply a digitally generated patch realistically onto the sign.
BeautyFace is a dataset containing 3,000 high-quality face images with a higher resolution of 512*512, covering more recent makeup styles and more diverse face poses, backgrounds, expressions, races, illumination. Each face has annotated parsing map.
Accidental Turntables contains a challenging set of 41,212 images of cars in cluttered backgrounds, motion blur and illumination changes that serves as a benchmark for 3D pose estimation.
This dataset is the image stimulus pool of 50 deepfake and 50 real images, used for the experiment in the study titled "Testing Human Ability To Detect Deepfake Images of Human Faces".
The Rotterdam EyePACS AIROGS dataset (in full, so including train and test) contains 113,893 color fundus images from 60,357 subjects and approximately 500 different sites with a heterogeneous ethnicity.
Histological images of colorectal cancer, derived from the TCGA database
To encourage reproducible research, a labeled MultiRAW dataset containing>7k RAW images acquired using multiple camera sensors is made publicly accessible for RAW-domain processing.
FETA benchmark focuses on text-to-image and image-to-text retrieval in public car manuals and sales catalogue brochures. The FETA Car-Manuals dataset consists of a total of 349 PDF documents from 5 car manufacturers, namely Nissan, Toyota, Mazda, Renault, Chevrolet.
FETA benchmark focuses on text-to-image and image-to-text retrieval in public car manuals and sales catalogue brochures. The FETA IKEA dataset contains 26 documents with 7366 pages total, approximately 9574 images and 23927 texts automatically extracted from those pages.