3,275 machine learning datasets
3,275 dataset results
Contains ~60000 HD images of Deformable Linear Objects (DLOs) generated using blender. The dataset contains a variety of industrial-looking backgrounds and contains instance segmentation masks. The main task for this dataset is DLO instance segmentation.
SOMPT22 is a multi-object tracking (MOT) benchmark focused on surveillance-style pedestrian tracking.
The first large-scale dataset for training and evaluating novel-view synthesis from blurred images.
A real-world low-light camera motion blur dataset for evaluating deblurring radiance fields methods.
This Wider-Test-200 dataset is introduced in the following paper: "Towards Unsupervised Blind Face Restoration using Diffusion Prior"
PsOCR is a large-scale synthetic dataset for Optical Character Recognition in low-resource Pashto language.
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
The synRailObs contains following categories: - Person - Rocks - Vehicles - Moto-cars - Animals
MVTec-AC is a curated refinement of the widely-used MVTec-AD dataset, specifically designed for anomaly classification—distinguishing between different types of anomalies rather than merely detecting if an image is anomalous. While MVTec-AD focuses on binary detection and suffers from mislabeled or ambiguous samples, MVTec-AC introduces manually corrected labels and reorganized anomaly categories to enable robust multi-class evaluation. Key improvements include the correction of 36 misclassified samples, merging of 4 overlapping classes, removal of 4 ambiguous ‘combined’ classes, and exclusion of the toothbrush category, which contains only a single trivial anomaly type. These changes support consistent, fine-grained assessment of classification models in industrial visual inspection contexts.
VisA-AC is a refined benchmark based on the VisA dataset, tailored for the task of anomaly classification—distinguishing between different types of anomalies rather than simply detecting whether an image is anomalous. While the original VisA provides anomaly type information in an Excel file, it includes numerous under-sampled and ambiguous classes. VisA-AC addresses these issues by removing classes with fewer than 10 samples, merging visually similar categories, and manually correcting mislabeled samples. Additionally, anomaly classes in VisA-AC are organized into separate folders—following the structure of MVTec-AC—for easier integration and usage. The resulting dataset ensures both statistical robustness and semantic clarity, supporting rigorous evaluation of multi-class anomaly classification methods in real-world industrial settings.
We create the first open-source large-scale S2V generation dataset OpenS2V-5M, which consists of five million high-quality 720P subject-text-video triples. To ensure subject-information diversity in our dataset by, we (1) segmenting subjects and building pairing information via cross-video associations and (2) prompting GPT-Image on raw frames to synthesize multi-view representations. The dataset supports both Subject-to-Video and Text-to-Video generation tasks.
The Industrial Textile Defect Detection (ITDD) dataset includes 1885 industrial textile images categorized into 4 categories: cotton fabric, dyed fabric, hemp fabric, and plaid fabric. These classes are collected from the industrial production sites of WEIQIAO Textile. ITDD is an upgraded version of WFDD that reorganizes three original classes and adds one new class.
WebGen-Bench WebGen-Bench is created to benchmark LLM-based agent's ability to generate websites from scratch. The dataset is introduced in WebGen-Bench: Evaluating LLMs on Generating Interactive and Functional Websites from Scratch. It contains 101 instructions and 647 test cases. It also has a training set of 6667 instructions, named WebGen-Instruct.
The first and the one open dataset for Russian finger- spelling, contained 1,593 annotated phrases and over 37 thousand HD+ videos.
Vehicles in the Middle East (VME) dataset, designed explicitly for vehicle detection in high-resolution satellite images from Middle Eastern countries. Sourced from Maxar, the VME dataset spans 54 cities across 12 countries, comprising over 4,000 image tiles and more than 100,000 vehicles, annotated using both manual and semi-automated methods. Also, we introduce the largest benchmark dataset for Car Detection in Satellite Imagery (CDSI), combining images from multiple sources to enhance global car detection.
The Matador dataset is a material image dataset with hierarchical labels. The hierarchical labels are derived from a new taxonomy. For each sample of a material, we collect a local appearance image, local surface structure LiDAR scan, global context image, and record any camera motion that takes place during the capture sequence. The dataset is intended to grow over time. To date, Matador contains 57 different material categories and a total of ~7,200 images, averaging 126 samples of intraclass variance.
A collection of test sets for evaluating base and chat LLMs (incl. VLMs) on Greek generation and understanding capabilities.
U2-BENCH is the first large-scale benchmark for evaluating Large Vision-Language Models (LVLMs) on ultrasound imaging understanding. It provides a diverse, multi-task dataset curated from 40 licensed sources, covering 15 anatomical regions and 8 clinically inspired tasks across classification, detection, regression, and text generation.
This is the dataset released along with the publication:
The prospective upper body thermal images SARS-CoV2 association study was designed to test the hypothesis that thermal videos may aid in the early diagnosis of COVID-19. The study recorded a set of measurements from 252 participants regarding PCR results, demographics, vital signs, participant activities, medications, respiratory symptoms, and a thermal video session where the volunteers performed simple breath-hold in four different positions. The acquired data may be used to test clinical association questions regarding temperature patterns, demographics, and vital signs. Furthermore, it could be valuable to develop new computer algorithms for extracting useful scientific information from thermal videos.