TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

3,275 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

3,275 dataset results

TCMP-300 (Traditional Chinese Medicinal Plant Dataset)

Traditional Chinese medicinal plants are often used to prevent and treat diseases for the human body. Since various medicinal plants have different therapeutic effects, plant recognition becomes an important topic. Traditional identification of medicinal plants mainly relies on human experts, which does not meet the increased requirements in clinical practice. Artificial Intelligence (AI) research for plant recognition faces challenges due to the lack of a comprehensive medicinal plant dataset. Therefore, we present a Chinese medicinal plant dataset that including 52089 images in 300 categories. Compared to the existing medicinal plant datasets, our dataset has more categories and fine-grained plant parts to facilitate comprehensive plant recognition. The plant images were collected through the Bing search engine and cleaned by a pretrained vision foundation model with human verification. Our dataset promotes the development and validation of advanced AI models for robust and accurate

0 papers1 benchmarksImages

FortisAVQA

We introduce FortisAVQA, a dataset designed to assess the robustness of AVQA models. Its construction involves two key processes: rephrasing and splitting. Rephrasing modifies questions from the test set of MUSIC-AVQA to enhance linguistic diversity, thereby mitigating the reliance of models on spurious correlations between key question terms and answers. Splitting entails the automatic and reasonable categorization of questions into frequent (head) and rare (tail) subsets, enabling a more comprehensive evaluation of model performance in both in-distribution and out-of-distribution scenarios.

0 papers0 benchmarksAudio, Images, Texts, Videos

Colorectal-Liver-Metastases (Colorectal-Liver-Metastases | Preoperative CT and Survival Data for Patients Undergoing Resection of Colorectal Liver Metastases)

This collection consists of DICOM images and DICOM Segmentation Objects (DSOs) for 197 patients with Colorectal Liver Metastases (CRLM). The collection consists of a large, single-institution consecutive series of patients that underwent resection of CRLM and matched preoperative computed tomography (CT) scans for quantitative image analysis. Inclusion criteria were (a) pathologically confirmed resected CRLM, (b) available data from pathologic analysis of the underlying non-tumoral liver parenchyma and hepatic tumor, (c) available preoperative conventional portal venous contrast-enhanced multi-detector computed tomography (MDCT) performed within 6 weeks of hepatic resection. Patients with 90-day mortality or that had less than 24 months of follow-up were excluded. Additionally, because pathologic and radiographic alterations of the non-tumoral liver parenchyma caused by hepatic artery infusion (HAI) of chemotherapy are not well described, any patient who received preoperative HAI was e

0 papers0 benchmarks3D, Biomedical, Images, Medical

WTA/TLA (WTA/TLA: A UAV-captured Dataset for Semantic Segmentation of Energy Infrastructure)

WTA (Wind Turbine Aerial) and TLA (Transmission Line Aerial) are public datasets which contain a set of RGB images from wind turbine farms and transmission towers and power lines, along with semantic ground truth for relevant classes. This is the official repository of the paper: WTA/TLA: A UAV-captured Dataset for Semantic Segmentation of Energy Infrastructure (url).

0 papers0 benchmarksImages

CODrone

Applications of unmanned aerial vehicle (UAV) in logistics, agricultural automation, urban management, and emergency response are highly dependent on oriented object detection (OOD) to enhance visual perception. Although existing datasets for OOD in UAV provide valuable resources, they are often designed for specific downstream tasks. Consequently, they exhibit limited generalization performance in real flight scenarios and fail to thoroughly demonstrate algorithm effectiveness in practical environments. To bridge this critical gap, we introduce CODrone, a comprehensive oriented object detection dataset for UAVs that accurately reflects real-world conditions. It also serves as a new benchmark designed to align with downstream task requirements, ensuring greater applicability and robustness in UAV-based OOD. Based on application requirements, we identify four key limitations in current UAV OOD datasets-low image resolution, limited object categories, single-view imaging, and restricte

0 papers0 benchmarksImages

PALMS

Data in this study come from western Ecuador's Choco tropical forest, including \textit{Fundación para la Conservación de los Andes Tropicales Reserve and adjacent Reserva Ecológica Mache-Chindul park} (FCAT; 00$^\circ$23'28'' N, 79$^\circ$41'05'' W), \textit{Jama-Coaque Ecological Reserve} (00$^\circ$06'57'' S, 80$^\circ$07'29'' W), \textit{Canande Reserve} (0$^\circ$31'34'' N 79$^\circ$12'47'' W), and \textit{Tesoro Escondido Reserve} (0$^\circ$33'16'' N 79$^\circ$10'31'' W). FCAT is a high diversity humid tropical forest at elevation $\sim$500m, receiving $\sim$3000 mm yr$^{-1}$ precipitation with persistent fog during drier period. Jama-Coaque ranges from the boundary of the tropical moist deciduous/tropical moist evergreen forest at the lower elevations ($\sim$1000 mm precipitation yr$^{-1}$, $\sim$250 m asl) to fog-inundated wet evergreen forests above 580m to 800m. Canande (350–500 m elevation) and Tesoro Escondido ($\sim$200 m elevation) are lowland everwet Choco forests, both

0 papers0 benchmarksImages

Potato Plant Diseases Data

Description:

0 papers0 benchmarksImages

Shanghai2020 (Shanghai-2020 Dataset)

It is released by the Shanghai Central Meteorological Observatory (SCMO) in 2020, records serval years of historical precipitation events in the Yangtze River delta area. The dataset contains a total of 43000 samples of precipitation events, of which 40000 samples for training and 3000 samples for testing. Each sample consists of 20 consecutive radar echo frames and lasts for 3 hours, where the first 10 frames are with intervals of 6 minutes and the last 10 frames are with intervals of 12 minutes. The echo frame has 460 * 460 resolution and covers 460km * 398km region. We additionally split out 3000 samples from the training set and use them for validation.

0 papers0 benchmarksImages, Videos

Burmese Handwritten Digit Dataset (BHDD)

The Burmese Handwritten Digit Dataset (BHDD) is a dataset project specifically created for recognizing handwritten Burmese digits. It is a Burmese version of MNIST dataset with a training set of 60,000 examples, and a test set of 27,561 examples.

0 papers0 benchmarksImages

Tornet (Tornado Network)

The Tornado Network (TorNet) dataset is a large, high-resolution benchmark dataset developed to support machine learning research in tornado detection and prediction. It comprises over 200,000 radar samples derived from 9 years of full-resolution, polarimetric WSR-88D (NEXRAD) level-II and level-III radar data. Each sample, called a "chip," includes multiple radar variables—such as reflectivity, radial velocity, spectrum width, differential reflectivity, correlation coefficient, and specific differential phase—captured across two elevation angles and four time steps spaced five minutes apart. Rather than converting radar data to Cartesian coordinates, TorNet retains its native polar format, preserving spatial fidelity near the radar site. This level of detail enables the dataset to support a wide range of machine learning techniques, including deep learning models that can learn directly from raw radar imagery.

0 papers0 benchmarksImages

PEARL30K

The PEARL dataset comprises with 30K pedestrian images, each annotated with 25 attribute categories, spanning over 146 sub-attributes. We have collected images from outdoor surveillance that reflect practical applications and challenges. We comprehensively cover nearly all critical attributes relevant to security surveillance applications, comprising aspects such as body posture, accessories, bag types, clothing styles, colors, and activities. To diversify, we have extracted images from twelve countries that covers seven distinct public locations including streets, parks, airports, stations, college campuses, beaches, and marketplaces. Additionally, we have incorporated four distinct weather conditions: sunny, night-time, rainy, and snow.

0 papers0 benchmarksImages

SENTINEL

Dataset Card for SENTINEL:<br> Mitigating Object Hallucinations via Sentence-Level Early Intervention <!-- omit in toc --> <a href='https://arxiv.org/abs/2507.12455'> <img src='https://img.shields.io/badge/Paper-Arxiv-purple'></a> <a href='https://github.com/pspdada/SENTINEL'> <img src='https://img.shields.io/badge/Github-Repo-Green'></a>

0 papers0 benchmarksImages, Texts

NHR-Edit (NoHumansRequired Edit Dataset)

NHR-Edit is a training dataset for instruction-based image editing. Each sample consists of an input image, a natural language editing instruction, and the corresponding edited image. All samples are generated fully automatically using the NoHumanRequired pipeline, without any human annotation or filtering.

0 papers0 benchmarksImages, Texts

COCO-Facet

COCO-Facet is a benchmark for attribute-focused text-to-image retrieval, comprising 9,112 queries with 100 candidate images for each. The images are from COCO images, and the annotations are from available annotations of COCO images (COCO, Visual7W, VisDial, COCO-Stuff).

0 papers0 benchmarksImages, Texts

MC-Bench

A dataset for multi-context visual grounding.

0 papers0 benchmarksImages
PreviousPage 164 of 164