3,275 machine learning datasets
3,275 dataset results
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
The dataset contains aerial images containing three commonly occurring natural disasters earthquake/collapsed buildings, flood, wildfire/fire, and a normal class; do not reflect any disaster. It consist of 167723 aerial images divided into 4 classes. The dataset is an extension of the AIDER dataset (Aerial Image Dataset for Emergency Response Applications).
Wood plate bark removal processing is critical for ensuring the quality of wood processing and its products. To address the issue of lack of datasets available for the application of deep learning methods to this field, and to fill the research gap of deep learning methods in the application field of wood plate bark removal equipment, a benchmark for wood plate segmentation in bark removal processing is proposed in this study.
A well-labeled challenging dataset, to facilitate the research on style recognition on anime images by collecting images from 190 anime and cartoon works covering 93 years from 13 countries and regions, 2D and 3D work into consideration concurrently. We choose at most ten roles for each work. All the images are obtained from the Internet. The images in the LSASRD dataset are mainly from existing anime and cartoons. Moreover, some are from comics or games of the same anime series. Unlike illustration or video datasets, we provide a moderate amount of contextual information in a wide variety of styles. LSASRD requires the ability of context understanding of image models.
The TimberVision dataset consists of more than 2k annotated RGB images and contains a total of 51k trunk components including cut and lateral surfaces, thereby surpassing any existing dataset in this domain in terms of both quantity and detail by a large margin. The dataset can be used to train oriented object detection and instance segmentation and evaluate the influence of multiple scene parameters on model performance. Additionally, a generic framework is provided to fuse the components detected by the models for both tasks into unified trunk representations. Furthermore, geometric properties are derived automatically and multi-object tracking is applied to further enhance robustness.
Rethinking Abdominal Organ Segmentation (RAOS) in the clinical scenario: A robustness evaluation benchmark with challenging cases.
This is the paper “DF-RAP: A Robust Adversarial Perturbation for Defending against Deepfakes in Real-world Social Network Scenarios" OSN-transmission CelebA sampling dataset collected by manual upload and download. This dataset includes 30,000 facial images of size 256×256 transmitted through online social networks (OSN) and their corresponding original images. Among them, Facebook, Twitter, WeChat and Weibo were selected as the transmission OSN, with 7500 images each.
Dataset for testing the ability of Vision Language Models (LVM) to recognize and match 3D objects of the exact same 3D shapes but with different orientation/materials/textures/ environments and light conditions.
M²ConceptBase is a concept-centric multimodal knowledge base designed to bridge the gap between visual and linguistic semantics. It features 951K images and 152K concepts, with each concept linked to an average of 6.27 images and a detailed textual description.
The Construction Industry Steel Ordering Lists (CISOL) dataset comprises table-centric, real-world documents from the construction industry, annotated to facilitate the testing and training of deep learning models for table detection (TD) and table structure recognition (TSR).
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
MP-IDB comprises four species of Malaria parasites: Falciparum, Malariae, Ovale, Vivax. For each species, there are four distinct stages of life, described in the filenames as follows:
In this project, we tried to make malaria detection easily possible at a low cost. We present M5-malaria Dataset which is the first-ever dataset that is across microscopes and across magnifications. Malaria, a fatal but curable disease claims hundreds of thousands of lives every year. Early and correct diagnosis is vital to avoid health complexities, however, it depends upon the availability of costly microscopes and trained experts to analyze blood-smear slides. Deep learning-based methods have the potential to not only decrease the burden of experts but also improve diagnostic accuracy on low-cost microscopes. However, this is hampered by the absence of a reasonable size dataset. One of the most challenging aspects is the reluctance of the experts to annotate the dataset at low magnification on low-cost microscopes. We present a dataset to further the research on malaria microscopy over the low-cost microscopes at low magnification. Our large-scale dataset consists of images of blood
Late third instar wing imaginal discs were cultured in Shields and Sang M3 media (Sigma) supplemented with 2% FBS (Sigma), 1% pen/strep (Gibco), 3ng/ml ecdysone (Sigma) and 2ng/ml insulin (Sigma). Wing discs were cultured in 35mm fluorodishes (WPI) under 12mm filters (Millicell), as described in https://doi.org/10.1038%2Fs41567-019-0618-1
ENSeg Dataset Overview This dataset represents an enhanced subset of the ENS dataset. The ENS dataset comprises image samples extracted from the enteric nervous system (ENS) of male adult Wistar rats (Rattus norvegicus, albius variety), specifically from the jejunum, the second segment of the small intestine.
The Liver-US dataset is a comprehensive collection of high-quality ultrasound images of the liver, including both normal and abnormal cases. This dataset is designed to facilitate research in medical image classification, with a focus on liver-related conditions. It includes a diverse range of ultrasound images acquired from multiple clinical settings, providing a robust foundation for developing and validating machine learning models in medical image analysis. Detailed Dataset Description
The SimBEV dataset is a collection of 320 scenes spread across all 11 CARLA maps and contains data from a variety of sensors, including five camera types (RGB, semantic segmentation, instance segmentation, depth, and optical flow), lidar, semantic lidar, radar, GNSS, and IMU, along with 3D object bounding boxes and accurate bird's-eye view (BEV) ground truth. With each scene lasting 16 seconds at a frame rate of 20 Hz, the SimBEV dataset contains 102,400 annotated frames, over 8 million 3D object bounding boxes, and more than 2.5 billion BEV ground truth labels.
TUMTraffic-VideoQA is a novel dataset designed to understand spatiotemporal video in complex roadside traffic scenarios. The dataset comprises 1,000 videos, featuring 85,000 multiple-choice QA pairs, 2,300 object captioning, and 5,700 object grounding annotations, encompassing diverse real-world conditions such as adverse weather and traffic anomalies. By incorporating tuple-based spatiotemporal object expressions, TUMTraffic-VideoQA unifies three essential tasks—multiple-choice video question answering, referred object captioning, and spatiotemporal object grounding—within a cohesive evaluation framework.
We introduce TextAtlas5M, a dataset specifically designed for training and evaluating multimodal generation models on dense-text image generation.
Physical concept understanding benchmark.