TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

3,275 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

3,275 dataset results

IllusionFashionMNIST_test

IllusionFashionMNIST_test Dataset Characteristics IllusionFashionMNIST_test is a generated dataset derived from the FashionMNIST dataset. It incorporates the concept of pareidolia—a phenomenon where patterns, often faces, are perceived in random or abstract stimuli. The dataset contains 11 classes: the original 10 classes from FashionMNIST, and an additional "No Illusion" class. It includes 1,267 samples, all synthetically created rather than real-world images.

1 papers0 benchmarksImages, Texts

IllusionAnimals_test

IllusionAnimals_test Dataset Characteristics IllusionAnimals_test is a generated dataset based on a synthetic collection of animal images, including 10 animal classes: cat, dog, pigeon, butterfly, elephant, horse, deer, snake, fish, and rooster. Additionally, it includes a "No Illusion" class, bringing the total number of classes to 11. The dataset contains 1,100 samples, all created synthetically rather than derived from real-world images.

1 papers0 benchmarksImages, Texts

IllusionChar_test

IllusionChar_test Dataset Characteristics IllusionChar_test is a generated dataset containing 3,300 samples of images that feature sequences of 3 to 5 random characters. Unlike classification-focused datasets, this dataset is designed for tasks that require reasoning about patterns, sequences, or illusions within the character sequences. All images are synthetically generated, and no real-world data is included.

1 papers0 benchmarksImages, Texts

EGC-FPHFS (Early Gastric Cancer Data from First People's Hospital of Foshan)

High-resolution early gastric cancer (EGC) detection and analysis: Patient Data:Datasets often include images from patients diagnosed with gastric cancer, specifically distinguishing between early gastric cancer (EGC) and Non -pathogenic gastric cancer (NGC). The study utilized data from 341 patients, with 124 classified as EGC and 217 as NGC. Image Types: High-resolution images are typically obtained from endoscopy image. Data Volume: The size of datasets mentioned a dataset of 1120 images specifically for EGC detection and 2150 images for NGC.

1 papers1 benchmarksImages, Medical

MS-HAB-Demonstrations (ManiSkill-HAB Demonstration Datasets)

Whole-body, low-level control/manipulation demonstration dataset for ManiSkill-HAB. Demonstrations are organized by task-subtask-object. All demos use RGBD (128x128) and state. JSON files store metadata (tincluding even labels and success/failure mode), while HDF5 files store demonstration data.

1 papers0 benchmarksActions, Images, RGB-D, Replay data

UAVDB (Trajectory-Guided Adaptable Bounding Boxes for UAV Detection)

UAVDB is a high-resolution RGB video dataset meticulously designed for UAV detection tasks across diverse scales and complex backgrounds. Comprising 10,763 training, 2,720 validation, and 4,578 test images (18,061 total) across datasets and camera configurations, it addresses key limitations of existing datasets, such as inaccurate bounding box annotations and limited diversity in environmental contexts, thereby enhancing the reliability and applicability of detection algorithms in real-world scenarios.

1 papers1 benchmarksImages

MVTec-FS (MVTec few-shot detection and classfication dataset)

The MVTec-FS dataset is a refined version of the MVTec AD dataset, designed for few-shot learning research. It contains instance-level annotations of anomaly images and is tailored to support tasks such as:

1 papers0 benchmarksImages

SIRST-UAVB (Single frame infrared small target dataset - UAV and birds.)

Infrared dim-small target detection has gained increasing importance in both military and civilian applications due to its ability to detect thermal radiation, operate effectively at night, passively sense radiation, and offer strong concealment with high resistance to interference. These capabilities make it ideal for systems such as aircraft and bird surveillance, missile guidance, and maritime rescue operations. In these applications, the need for mid- to long-range observations often results in small targets that appear dim and are difficult to detect. This dataset, named SIRST-UAVB, provides infrared images captured in the 3–5 μm wavelength range using a mid-wave infrared camera, with a resolution of 640×512 pixels and shooting distances ranging from 100 to 800 meters. The dataset predominantly features small targets, which make up 94.3% of the total data and include unmanned aerial vehicles (UAVs) and birds. These targets are presented against complex backgrounds, such as skies,

1 papers0 benchmarksImages

AODRaw (Adverse condition Object Detection with RAW images)

We introduce the AODRaw dataset, which offers 7,785 high-resolution real RAW images with 135,601 annotated instances spanning 62 categories, capturing a broad range of indoor and outdoor scenes under 9 distinct light and weather conditions. AODRaw supports RAW and sRGB object detection.

1 papers5 benchmarksImages

MMComposition

MMCOMPOSITION is a high-quality benchmark specifically designed to comprehensively evaluate the compositionality of pre-trained Vision-Language Models (VLMs) across three main dimensions—VL compositional perception, reasoning, and probing—which are further divided into 13 distinct categories of questions. While previous benchmarks have mainly focused on text-to-image retrieval, single-choice questions, and open-ended text generation, MMCOMPOSITION introduces a more diverse and challenging set of 4,342 tasks covering both single-image and multi-image scenarios, as well as single-choice and indefinite-choice formats. This expanded range of tasks aims to capture the complex interplay between vision and language more effectively, surpassing earlier benchmarks such as ARO and Winoground by providing a more comprehensive and in-depth assessment of models’ cross-modal compositional capabilities.

1 papers0 benchmarksImages, Texts

Plancraft

An evaluation dataset for planning with LLM agents

1 papers0 benchmarksEnvironment, Images, Texts

MapEval-Visual

MapEval-Visual contains 400 image-question-answer triplets. Each question is paired with a snapshot from google maps website. The task is the answer question based on the provided map snapshot.

1 papers2 benchmarksImages, Texts

NYUDv2-IS

A RGB-D dataset converted from NYUDv2 into COCO-style instance segmentation format. To construct NYUDv2-IS, specifically tailored for instance segmentation, we generated instance masks that delineate individual objects in each image. These masks were labeled using the object class annotations provided in the original NYUDv2 dataset, which is distributed in MATLAB format. The process involved several key steps: (1) extracting binary instance masks, (2) converting these masks into polygon representations, and (3) generating COCO-style annotations. Each annotation includes essential attributes such as category ID, segmentation masks, bounding boxes, object areas, and image metadata. During this conversion, we focused on 9 categories out of the original 13 classes, excluding non-instance categories such as walls and floors. To ensure dataset quality, images without any object annotations were systematically removed.

1 papers1 benchmarksImages, RGB-D

SUN-RGBD-IS

A RGB-D dataset converted from SUN-RGBD into COCO-style instance segmentation format. To transform SUN-RGBD into an instance segmentation benchmark (i.e., SUN-RGBDIS), we employed a pipeline similar to that of NYUDv2-IS. We selected 17 categories from the original 37 classes, carefully omitting non-instance categories like ceilings and walls. Images lacking any identifiable object instances were filtered out to maintain dataset relevance for instance segmentation tasks. We systematically convert segmentation annotations into COCO format, generating precise bounding boxes, instance masks, and object attributes.

1 papers1 benchmarksImages, RGB-D

Box-IS

RGB-D instance segmentation box dataset. The Box-IS dataset was created to support research on human-robot collaboration with a focus on robotic manipulation tasks. It was captured using the Intel® RealSense™ Depth Camera D455, a high-performance sensor designed for depth imaging. To ensure precise depth measurements, we bypassed the default depth data processing of the sensor and performed accurate stereo matching directly from the captured left and right IR images. Employing the UniMatch technique, we derived a high-quality depth map from these stereo IR images, which was then aligned with the corresponding RGB image for a comprehensive output. The dataset was intentionally designed to encompass a broad range of scene complexities, from simple box arrangements to highly irregular configurations. This diversity ensures that it can effectively benchmark algorithms across varying levels of difficulty.

1 papers1 benchmarksImages, RGB-D

COph100

we introduce COph100, a novel and challenging dataset known as the Comprehensive Ophthalmology Retinal Image Registration dataset for infants with a wide range of image quality issues constituting the public "RIDIRP" database. COph100 consists of 100 eyes, each with 2 to 9 examination sessions, amounting to a total of 491 image pairs carefully selected from the publicly available dataset. We manually labeled the corresponding ground truth image points and provided automatic vessel segmentation masks for each image.

1 papers0 benchmarksImages

IUST_PersonReID

The IUST_PersonReID dataset was developed to address limitations in existing person re-identification datasets by including cultural and environmental contexts unique to Islamic countries, especially Iran and Iraq. Unlike common datasets, which don’t reflect the clothing styles common in these regions—such as hijabs and other coverings—the IUST_PersonReID dataset represents this diversity, helping to reduce demographic bias and improve model accuracy. Collected from various real-world settings under different lighting, camera angles, indoor & outdoor, and weather conditions, this dataset provides extensive, overlapping views across multiple cameras. By capturing these unique conditions, IUST_PersonReID offers a valuable resource for developing re-ID models that perform more reliably across diverse environments and populations.

1 papers4 benchmarksImages

WayveScenes101

bbb

1 papers0 benchmarksImages

SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers

SPIQA Dataset Card Dataset Details Dataset Name: SPIQA (Scientific Paper Image Question Answering)

1 papers0 benchmarksImages, Texts

Changen2-S9-27k

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 papers0 benchmarksImages
PreviousPage 144 of 164Next