TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

3,275 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

3,275 dataset results

EyePACS-light (v2) (EyePACS-AIROGS-light-v2)

This is an improved machine-learning-ready glaucoma dataset using a balanced subset of standardized fundus images from the Rotterdam EyePACS AIROGS [1] set. This dataset is split into training, validation, and test folders which contain 4000 (~84%), 385 (~8%), and 385 (~8%) fundus images in each class respectively. Each training set has a folder for each class: referable glaucoma (RG) and non-referable glaucoma (NRG).

0 papers0 benchmarksImages, Medical

ALFI (Annotations for Label-Free Images)

ALFI (Annotations for Label-Free Images) is a dataset of images and annotations for label-free microscopy imaging. It consists of 29 time-lapse image sequences with various annotations (pixel-wise segmentation masks, object-wise bounding boxes, and tracking information), made publicly available to the scientific community through figshare.

0 papers0 benchmarksBiology, Images, Texts, Tracking

AASCE (Accurate Automated Spinal Curvature Estimation)

The purpose of this challenge is to investigate (semi-)automatic spinal curvature estimation algorithms. Participant will have to submit results of Cobb angle for all the test data.

0 papers0 benchmarksImages, Medical

DREAMING Inpainting Dataset (Diminished Reality for Emerging Applications in Medicine through Inpainting Dataset)

Dataset for the DREAMING - Diminished Reality for Emerging Applications in Medicine through Inpainting Challenge!

0 papers0 benchmarksBiomedical, Images, Medical, RGB Video, Videos

WM-300K+ wafer map [Single & Mixed]

reference paper

0 papers0 benchmarksImages

OCT5k

The thickness and appearance of retinal layers are essential markers for diagnosing and studying eye diseases. Despite the increasing availability of imaging devices to scan and store large amounts of data, analyzing retinal images and generating trial endpoints has remained a manual, error-prone, and time-consuming task. In particular, the lack of large amounts of high-quality labels for different diseases hinders the development of automated algorithms. Therefore, we have compiled 5016 pixel-wise manual labels for 1672 optical coherence tomography (OCT) scans featuring two different diseases as well as healthy subjects to help democratize the process of developing novel automatic techniques. We also collected 4698 bounding box annotations for a subset of 566 scans across 9 classes of disease biomarker. Due to variations in retinal morphology, intensity range, and changes in contrast and brightness, designing segmentation and detection methods that can generalize to different disease

0 papers0 benchmarksImages

CBLPRD-330k (China-Balanced-License-Plate-Recognition-Dataset-330k)

A high-quality, balanced dataset of 330,000 images featuring various types of Chinese license plates. The dataset is generated using Generative Adversarial Networks (GANs), ensuring excellent image quality and a balanced distribution of different license plate types. This dataset is perfect for training and evaluating license plate recognition models.

0 papers0 benchmarksImages

ABODA (Abandoned Object Dataset)

ABandoned Objects DAtaset (ABODA) is a new public dataset for abandoned object detection. ABODA comprises 11 sequences labeled with various real-application scenarios that are challenging for abandoned-object detection. The situations include crowded scenes, marked changes in lighting condition, night-time detection, as well as indoor and outdoor environments.

0 papers0 benchmarksImages, Videos

Mudestreda (Mudestreda Multimodal Device State Recognition Dataset)

Mudestreda Multimodal Device State Recognition Dataset obtained from real industrial milling device with Time Series and Image Data for Classification, Regression, Anomaly Detection, Remaining Useful Life (RUL) estimation, Signal Drift measurement, Zero Shot Flank Took Wear, and Feature Engineering purposes.

0 papers0 benchmarksAudio, Images, Time series

DMS (Dense Material Segmentation Dataset)

The Dense Material Segmentation Dataset (DMS) consists of 3 million polygon labels of material categories (metal, wood, glass, etc) for 44 thousand RGB images. The dataset is described in the research paper, A Dense Material Segmentation Dataset for Indoor and Outdoor Scene Parsing.

0 papers0 benchmarksImages

MIMIC Meme Dataset (Misogyny Identification in Multimodal Internet Content in Hindi-English Code-Mix Language)

This dataset endeavors to fill the research void by presenting a meticulously curated collection of misogynistic memes in a code-mixed language of Hindi and English. It introduces two sub-tasks: the first entails a binary classification to determine the presence of misogyny in a meme, while the second task involves categorizing the misogynistic memes into multiple labels, including Objectification, Prejudice, and Humiliation.

0 papers0 benchmarksImages, Texts

tomato detection (A dataset of tomato fruits images for object detection in the complex lighting environment of plant factories)

Plant factories are an advanced form of facility agriculture that enable efficient plant cultivation through controllable environmental conditions, making them highly suitable for the automation and intelligent application of machinery. Tomato cultivation in plant factories has significant economic and agricultural value and can be utilized for various applications such as seedling cultivation, breeding, and genetic engineering. However, manual completion is still required for operations such as detection, counting, and classification of tomato fruits, and the application of machine detection is currently inefficient. Furthermore, research on the automation of tomato harvesting in plant factory environments is limited due to the lack of a suitable dataset. To address this issue, a tomato fruit dataset was constructed for plant factory environments, named as TomatoPlantfactoryDataset, which can be quickly applied to multiple tasks, including the detection of control systems, harvesting

0 papers0 benchmarksImages

tomato fruits detection (A dataset of tomato fruits images for object detection in the complex lighting environment of plant factories)

Plant factories are an advanced form of facility agriculture that enable efficient plant cultivation through controllable environmental conditions, making them highly suitable for the automation and intelligent application of machinery. Tomato cultivation in plant factories has significant economic and agricultural value and can be utilized for various applications such as seedling cultivation, breeding, and genetic engineering. However, manual completion is still required for operations such as detection, counting, and classification of tomato fruits, and the application of machine detection is currently inefficient. Furthermore, research on the automation of tomato harvesting in plant factory environments is limited due to the lack of a suitable dataset. To address this issue, a tomato fruit dataset was constructed for plant factory environments, named as TomatoPlantfactoryDataset, which can be quickly applied to multiple tasks, including the detection of control systems, harvesting

0 papers0 benchmarksImages

WildlifeReID-10k

<img src="https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12294787%2F2e9b3b5a8f236aab36655b4a0db4e311%2Foverview.jpg?generation=1718265309709943&alt=media" alt="drawing" style="width:700px;"/>

0 papers0 benchmarksBiology, Images

CloudSEN12

CloudSEN12 is a LARGE dataset (~1 TB) for cloud semantic understanding that consists of 49,400 image patches (IP) that are evenly spread throughout all continents except Antarctica. Each IP covers 5090 x 5090 meters and contains data from Sentinel-2 levels 1C and 2A, hand-crafted annotations of thick and thin clouds and cloud shadows, Sentinel-1 Synthetic Aperture Radar (SAR), digital elevation model, surface water occurrence, land cover classes, and cloud mask results from six cutting-edge cloud detection algorithms.

0 papers0 benchmarksImages

Bone Fracture Multi-Region X-ray Data

This dataset comprises fractured and non-fractured X-ray images covering all anatomical body regions, including lower limb, upper limb, lumbar, hips, knees, etc. The dataset is categorized into train, test, and validation folders, each containing fractured and non-fractured radiographic images.

0 papers0 benchmarksImages, Medical

Bone Fracture Multi-Region X-ray Dataset

This dataset consists of both fractured and non-fractured X-ray images encompassing various anatomical regions of the body, such as the lower limb, upper limb, lumbar region, hips, knees, and more. It is organized into three main folders: train, test, and validation, each containing both fractured and non-fractured radiographic images. You can freely access the dataset via the following link: https://www.kaggle.com/datasets/bmadushanirodrigo/fracture-multi-region-x-ray-data/data

0 papers0 benchmarksImages, Medical

Dronescapes

a large video dataset captured with UAVs in different complex real-world scenes, with multiple representations, suitable for multi-task learning.

0 papers0 benchmarksImages, RGB Video

Intersection Markings Dataset

The Remote Sensing dataset contains the following key features for each annotated marking:

0 papers0 benchmarksImages

FlareReal600

The FlareReal600 is a nighttime flare removal dataset, which contains 650 real-captured images pairs and 500 flare images. The training set contains 600 images pairs and 500 flare images and the validation set contains 50 image pairs. Images pairs within the dataset are captured from various place (e.g., street, park, indoor) and under incorrect & correct exposure settings. Each flare-corrupted image contains various light sources. Flare images are captured from a dark room with multiple-color light sources.

0 papers0 benchmarksImages
PreviousPage 161 of 164Next