3,275 machine learning datasets
3,275 dataset results
Traditional Chinese medicinal plants are often used to prevent and treat diseases for the human body. Since various medicinal plants have different therapeutic effects, plant recognition becomes an important topic. Traditional identification of medicinal plants mainly relies on human experts, which does not meet the increased requirements in clinical practice. Artificial Intelligence (AI) research for plant recognition faces challenges due to the lack of a comprehensive medicinal plant dataset. Therefore, we present a Chinese medicinal plant dataset that including 52089 images in 300 categories. Compared to the existing medicinal plant datasets, our dataset has more categories and fine-grained plant parts to facilitate comprehensive plant recognition. The plant images were collected through the Bing search engine and cleaned by a pretrained vision foundation model with human verification. Our dataset promotes the development and validation of advanced AI models for robust and accurate
We introduce FortisAVQA, a dataset designed to assess the robustness of AVQA models. Its construction involves two key processes: rephrasing and splitting. Rephrasing modifies questions from the test set of MUSIC-AVQA to enhance linguistic diversity, thereby mitigating the reliance of models on spurious correlations between key question terms and answers. Splitting entails the automatic and reasonable categorization of questions into frequent (head) and rare (tail) subsets, enabling a more comprehensive evaluation of model performance in both in-distribution and out-of-distribution scenarios.
This collection consists of DICOM images and DICOM Segmentation Objects (DSOs) for 197 patients with Colorectal Liver Metastases (CRLM). The collection consists of a large, single-institution consecutive series of patients that underwent resection of CRLM and matched preoperative computed tomography (CT) scans for quantitative image analysis. Inclusion criteria were (a) pathologically confirmed resected CRLM, (b) available data from pathologic analysis of the underlying non-tumoral liver parenchyma and hepatic tumor, (c) available preoperative conventional portal venous contrast-enhanced multi-detector computed tomography (MDCT) performed within 6 weeks of hepatic resection. Patients with 90-day mortality or that had less than 24 months of follow-up were excluded. Additionally, because pathologic and radiographic alterations of the non-tumoral liver parenchyma caused by hepatic artery infusion (HAI) of chemotherapy are not well described, any patient who received preoperative HAI was e
WTA (Wind Turbine Aerial) and TLA (Transmission Line Aerial) are public datasets which contain a set of RGB images from wind turbine farms and transmission towers and power lines, along with semantic ground truth for relevant classes. This is the official repository of the paper: WTA/TLA: A UAV-captured Dataset for Semantic Segmentation of Energy Infrastructure (url).
Applications of unmanned aerial vehicle (UAV) in logistics, agricultural automation, urban management, and emergency response are highly dependent on oriented object detection (OOD) to enhance visual perception. Although existing datasets for OOD in UAV provide valuable resources, they are often designed for specific downstream tasks. Consequently, they exhibit limited generalization performance in real flight scenarios and fail to thoroughly demonstrate algorithm effectiveness in practical environments. To bridge this critical gap, we introduce CODrone, a comprehensive oriented object detection dataset for UAVs that accurately reflects real-world conditions. It also serves as a new benchmark designed to align with downstream task requirements, ensuring greater applicability and robustness in UAV-based OOD. Based on application requirements, we identify four key limitations in current UAV OOD datasets-low image resolution, limited object categories, single-view imaging, and restricte
Data in this study come from western Ecuador's Choco tropical forest, including \textit{Fundación para la Conservación de los Andes Tropicales Reserve and adjacent Reserva Ecológica Mache-Chindul park} (FCAT; 00$^\circ$23'28'' N, 79$^\circ$41'05'' W), \textit{Jama-Coaque Ecological Reserve} (00$^\circ$06'57'' S, 80$^\circ$07'29'' W), \textit{Canande Reserve} (0$^\circ$31'34'' N 79$^\circ$12'47'' W), and \textit{Tesoro Escondido Reserve} (0$^\circ$33'16'' N 79$^\circ$10'31'' W). FCAT is a high diversity humid tropical forest at elevation $\sim$500m, receiving $\sim$3000 mm yr$^{-1}$ precipitation with persistent fog during drier period. Jama-Coaque ranges from the boundary of the tropical moist deciduous/tropical moist evergreen forest at the lower elevations ($\sim$1000 mm precipitation yr$^{-1}$, $\sim$250 m asl) to fog-inundated wet evergreen forests above 580m to 800m. Canande (350–500 m elevation) and Tesoro Escondido ($\sim$200 m elevation) are lowland everwet Choco forests, both
Description:
It is released by the Shanghai Central Meteorological Observatory (SCMO) in 2020, records serval years of historical precipitation events in the Yangtze River delta area. The dataset contains a total of 43000 samples of precipitation events, of which 40000 samples for training and 3000 samples for testing. Each sample consists of 20 consecutive radar echo frames and lasts for 3 hours, where the first 10 frames are with intervals of 6 minutes and the last 10 frames are with intervals of 12 minutes. The echo frame has 460 * 460 resolution and covers 460km * 398km region. We additionally split out 3000 samples from the training set and use them for validation.
The Burmese Handwritten Digit Dataset (BHDD) is a dataset project specifically created for recognizing handwritten Burmese digits. It is a Burmese version of MNIST dataset with a training set of 60,000 examples, and a test set of 27,561 examples.
The Tornado Network (TorNet) dataset is a large, high-resolution benchmark dataset developed to support machine learning research in tornado detection and prediction. It comprises over 200,000 radar samples derived from 9 years of full-resolution, polarimetric WSR-88D (NEXRAD) level-II and level-III radar data. Each sample, called a "chip," includes multiple radar variables—such as reflectivity, radial velocity, spectrum width, differential reflectivity, correlation coefficient, and specific differential phase—captured across two elevation angles and four time steps spaced five minutes apart. Rather than converting radar data to Cartesian coordinates, TorNet retains its native polar format, preserving spatial fidelity near the radar site. This level of detail enables the dataset to support a wide range of machine learning techniques, including deep learning models that can learn directly from raw radar imagery.
The PEARL dataset comprises with 30K pedestrian images, each annotated with 25 attribute categories, spanning over 146 sub-attributes. We have collected images from outdoor surveillance that reflect practical applications and challenges. We comprehensively cover nearly all critical attributes relevant to security surveillance applications, comprising aspects such as body posture, accessories, bag types, clothing styles, colors, and activities. To diversify, we have extracted images from twelve countries that covers seven distinct public locations including streets, parks, airports, stations, college campuses, beaches, and marketplaces. Additionally, we have incorporated four distinct weather conditions: sunny, night-time, rainy, and snow.
Dataset Card for SENTINEL:<br> Mitigating Object Hallucinations via Sentence-Level Early Intervention <!-- omit in toc --> <a href='https://arxiv.org/abs/2507.12455'> <img src='https://img.shields.io/badge/Paper-Arxiv-purple'></a> <a href='https://github.com/pspdada/SENTINEL'> <img src='https://img.shields.io/badge/Github-Repo-Green'></a>
NHR-Edit is a training dataset for instruction-based image editing. Each sample consists of an input image, a natural language editing instruction, and the corresponding edited image. All samples are generated fully automatically using the NoHumanRequired pipeline, without any human annotation or filtering.
COCO-Facet is a benchmark for attribute-focused text-to-image retrieval, comprising 9,112 queries with 100 candidate images for each. The images are from COCO images, and the annotations are from available annotations of COCO images (COCO, Visual7W, VisDial, COCO-Stuff).
A dataset for multi-context visual grounding.