TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

3,275 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

3,275 dataset results

GLAMI-1M (A Multilingual Image-Text Fashion Dataset)

We introduce GLAMI-1M: the largest multilingual image-text classification dataset and benchmark. The dataset contains images of fashion products with item descriptions, each in 1 of 13 languages. Categorization into 191 classes has high-quality annotations: all 100k images in the test set and 75% of the 1M training set were human-labeled. The paper presents baselines for image-text classification showing that the dataset presents a challenging fine-grained classification problem: The best scoring EmbraceNet model using both visual and textual features achieves 69.7% accuracy. Experiments with a modified Imagen model show the dataset is also suitable for image generation conditioned on text.

1 papers6 benchmarksImages, Texts

S-ODv2 (SeaDronesSee-Object Detection v2)

SeaDronesSee-Object Detection v2 (S-ODv2) dataset contains 14,227 RGB images (training: 8,930; validation: 1,547; testing: 3,750). The images are captured from various altitudes and viewing angles ranging from 5 to 260 meters and 0 to 90° degrees (gimbal pitch angle) while providing the respective meta information for altitude, viewing angle and other meta data for almost all frames.

1 papers0 benchmarksImages

Apron Dataset

The Apron Dataset focuses on training and evaluating classification and detection models for airport-apron logistics. In addition to bounding boxes and object categories the dataset is enriched with meta parameters to quantify the models’ robustness against environmental influences.

1 papers0 benchmarksImages

UMD-i Affrodance Dataset

One-Shot Affordance Part Segmentation variant of the UMD dataset. Each object instance in the dataset contains a single image.

1 papers0 benchmarksImages, RGB-D

MatSim (MatSim dataset for materials similarity recognition from images)

MatSim is a synthetic dataset, and natural image benchmark for computer vision-based recognition of similarities and transitions between materials and textures, focusing on identifying any material under any conditions using one or a few examples (one-shot learning), including materials states and subclasses.

1 papers0 benchmarksImages

EBHI-Seg

EBHI-Seg is a dataset containing 5,170 images of six types of tumor differentiation stages and the corresponding ground truth images. The dataset can provide researchers with new segmentation algorithms for medical diagnosis of colorectal cancer.

1 papers0 benchmarksImages, Medical

SolarDK

SolarDK is a dataset for the detection and localization of solar. It comprises images from GeoDanmark with a variable Ground Sample Distance (GSD) between 10 cm and 15 cm, all sampled between March 1st and May 1st during 2021, containing 23,417 hand labelled images for classification and 880 segmentation masks, in addition to a set of about 100,000+ images for classification covering most variations of Danish urban and rural landscapes.

1 papers0 benchmarksImages

CLEVR-MRT (CLEVR: Mental Rotation Tests)

CLEVR Mental Rotation Tests (CLEVR-MRT) is a new version of the CLEVR dataset. It contains 20 images generated for each scene holding a constant altitude and sampling over azimuthal angle. It is a controlled setting whereby questions are posed about the properties of a scene if that scene was observed from another viewpoint.

1 papers0 benchmarksImages, Texts

Computer Vision Arxiv Figures

Computer Vision Arxiv Figures dataset consists of 88,645 images that more closely resemble the structure of our visual prompts. The dataset was collected from Arxiv, the open-access web archive for scholarly articles from a variety of academic fields.

1 papers0 benchmarksImages

LIB-HSI (RGB and Hyperspectral images of Building Facades)

The LIB-HSI dataset contains hyperspectral reflectance images and their corresponding RGB images of building façades in a light industrial environment. The dataset also contains pixel-level annotated images for each hyperspectral/RGB image. The LIB-HSI dataset was created to develop deep learning methods for segmenting building facade materials.

1 papers0 benchmarksHyperspectral images, Images

DialogCC

DialogCC is a large-scale multi-modal dialogue dataset, which covers diverse real-world topics and various images per dialogue. It contains 651k unique images and is designed for image and text retrieval tasks.

1 papers0 benchmarksDialog, Images

REAP

REAP is a digital benchmark that allows the user to evaluate patch attacks on real images, and under real-world conditions. Built on top of the Mapillary Vistas dataset, the benchmark contains over 14,000 traffic signs. Each sign is augmented with a pair of geometric and lighting transformations, which can be used to apply a digitally generated patch realistically onto the sign.

1 papers0 benchmarksImages

BeautyFace

BeautyFace is a dataset containing 3,000 high-quality face images with a higher resolution of 512*512, covering more recent makeup styles and more diverse face poses, backgrounds, expressions, races, illumination. Each face has annotated parsing map.

1 papers0 benchmarksImages

Accidental Turntables

Accidental Turntables contains a challenging set of 41,212 images of cars in cluttered backgrounds, motion blur and illumination changes that serves as a benchmark for 3D pose estimation.

1 papers0 benchmarksImages, Videos

Selection from FFHQ & StyleGAN2:FFHQ (used in "Testing Human Ability To Detect Deepfake Images of Human Faces" study)

This dataset is the image stimulus pool of 50 deepfake and 50 real images, used for the experiment in the study titled "Testing Human Ability To Detect Deepfake Images of Human Faces".

1 papers0 benchmarksImages

AIROGS (Rotterdam EyePACS AIROGS)

The Rotterdam EyePACS AIROGS dataset (in full, so including train and test) contains 113,893 color fundus images from 60,357 subjects and approximately 500 different sites with a heterogeneous ethnicity.

1 papers0 benchmarksImages, Medical

CRCDX (TCGA-CRC-DX)

Histological images of colorectal cancer, derived from the TCGA database

1 papers0 benchmarksImages, Medical

multiRAW

To encourage reproducible research, a labeled MultiRAW dataset containing>7k RAW images acquired using multiple camera sensors is made publicly accessible for RAW-domain processing.

1 papers0 benchmarksImages

FETA Car-Manuals (FETA Car-Manuals dataset, image-text retrieval for foundation models' expert data performance.)

FETA benchmark focuses on text-to-image and image-to-text retrieval in public car manuals and sales catalogue brochures. The FETA Car-Manuals dataset consists of a total of 349 PDF documents from 5 car manufacturers, namely Nissan, Toyota, Mazda, Renault, Chevrolet.

1 papers6 benchmarksImages, Texts

FETA IKEA

FETA benchmark focuses on text-to-image and image-to-text retrieval in public car manuals and sales catalogue brochures. The FETA IKEA dataset contains 26 documents with 7366 pages total, approximately 9574 images and 23927 texts automatically extracted from those pages.

1 papers0 benchmarksImages, Texts
PreviousPage 127 of 164Next