3,275 machine learning datasets
3,275 dataset results
The DADE dataset, short for Driving Agents in Dynamic Environments, is a synthetic dataset designed for the training and evaluation of methods for the task of semantic segmentation in the context of autonomous driving agents navigating dynamic environments and weather conditions.
Confocal fluorescence microscopy is one of the most accessible and widely used imaging techniques for the study of biological processes at the cellular and subcellular levels. Scanning confocal microscopy allows the capture of high-quality images from thick three-dimensional (3D) samples, yet suffers from well-known limitations such as photobleaching and phototoxicity of specimens caused by intense light exposure, which limits its use in some applications, especially for living cells. Cellular damage can be alleviated by changing imaging parameters to reduce light exposure, often at the expense of image quality. Machine/deep learning methods for single-image super-resolution (SISR) can be applied to restore image quality by upscaling lower-resolution (LR) images to produce high-resolution images (HR). These SISR methods have been successfully applied to photo-realistic images due partly to the abundance of publicly available data. In contrast, the lack of publicly available data partl
We present the World Wide Dishes dataset which seeks to assess disparities in representations of food through a decentralised data collection effort to gather perspectives directly from people with a wide variety of backgrounds from around the globe with the aim of creating a dataset consisting of their insights into their own experiences of foods relevant to their cultural, regional, national, or ethnic lives.
This task stems from the observation that text embedded in images is intrinsically different from common visual elements and natural language due to the need to align the modalities of vision, text, and text embedded in images.
Foley audio, critical for enhancing the immersive experience in multimedia content, faces significant challenges in the AI-generated content (AIGC) landscape. Despite advancements in AIGC technologies for text and image generation, the foley audio dubbing remains rudimentary due to difficulties in cross-modal scene matching and content correlation. Current text-to-audio technology, which relies on detailed and acoustically relevant textual descriptions, falls short in practical video dubbing applications. Existing datasets like AudioSet, AudioCaps, Clotho, Sound-of-Story, and WavCaps do not fully meet the requirements for real-world foley audio dubbing task. To address this, we introduce the Multi-modal Image and Narrative Text Dubbing Dataset (MINT), designed to enhance mainstream dubbing tasks such as literary story audiobooks dubbing, image/silent video dubbing. Besides, to address the limitations of existing TTA technology in understanding and planning complex prompts, a Foley Audi
The COCO-WAN benchmark is designed to assess the impact of weakly annotations (combined with auto-annotation tools) noise on instance segmentation models. This benchmark is built upon the COCO dataset and incorporates noise generated through weak annotations, simulating real-world scenarios where annotations might be imperfect due to semi-automated tools. It includes various levels of noise to challenge the robustness and generalization capabilities of segmentation models.
This task aims to extract named entities and entity types while further predicting segmentation masks of visual objects.
We have developed a systematic method for constructing large text annotated image databases designed for exploiting vision-language modeling for image quality assessment and present the Text Annotated Distortion, Appearance and Content (TADAC) database containing over 1.6 million images annotated with texts about their semantic contents, distortion characteristics and appearance properties. We used existing labels or automatic image captioning to annotate the semantic content, designed a list of suitable textual phrases for describing the distortion characteristics, and developed automatic algorithms for computing the appearance properties and annotated these properties with suitable textual descriptions. The TADAC database is the first of its kind that is annotated with all three types of quality relevant texts to enable the learning of high level knowledge about all possible factors affecting image quality. TADAC has enabled the development of the first BIQA model (SLIQUE) that joint
The Synthetic Signature Bankcheck Images (SSBI) Dataset is the first publicly available dataset of bank check images with annotations for detecting handwritten components, including names, amounts, dates, and signatures. It also supports both writer-independent and writer-dependent signature verification tasks by providing labels for genuine and forged signatures and IDs of the signature authors.
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
This dataset provides simulated flood inundation maps of Abu Dhabi's coast under 174 different shoreline protection scenarios. The maps were produced with a high-fidelity physics-based hydrodynamic simulator under a 0.5-meter sea level rise projection. The details of the hydrodynamic model are reported in [1].
GelSight Young's Modulus Dataset by Michael Burgess
Pulmonary hypertension (PH) is a syndrome complex that accompanies a number of diseases of different etiologies, associated with basic mechanisms of structural and functional changes of the pulmonary circulation vessels and revealed pressure increasing in the pulmonary artery. The structural changes in the pulmonary circulation vessels are the main limiting factor determining the prognosis of patients with PH. Thickening and irreversible deposition of collagen in the pulmonary artery branches walls leads to rapid disease progression and a therapy effectiveness decreasing. In this regard, histological examination of the pulmonary circulation vessels is critical both in preclinical studies and clinical practice. However, measurements of quantitative parameters such as the average vessel outer diameter, the vessel walls area, and the hypertrophy index claimed significant time investment and the requirement for specialist training to analyze micrographs. A dataset of pulmonary circulation
A new in-context visual question answering dataset encompassing interleaved image and EHR data derived from MIMIC-IV and MIMIC-CXR-JPG databases.
VasTexture is a free giant repository of textures and PBR materials extracted from real-world images. The repository contains 500,000 highly diverse textures and PBR materials. All assets are free to download and use. The PBR materials and textures were extracted from natural images using an unsupervised approach (no human intervention). As a result, the textures and PBR materials are significantly more diverse but also significantly less refined compared to assets made using manual and AI approaches.
Tecnalia Hyperspectral Dataset contains different non-ferreous fractions of Waste from Electric and Electronic Equipment (WEEE) of Copper, Brass, Aluminum, Stainless Steel and White Copper. Images were captured by a hyperspectral Specim PHF Fast10 camera that is able to capture wavelengths in the range 400 to 1000 nm with a spectral resolution of less than 1 nm. The PHF Fast10 camera is equipped with a CMOS sensor (1024 × 1024 resolution), a Camera Link interface and a special Fore objective OL10. The provided dataset contains 76 uniformly distributed wave-lengths in the spectral range [415.05 nm, 1008.10 nm]. Illumination setup, as described in \cite{picon2012real}, was specifically designed to reduce the specular reflections generated by the surface of the non-ferrous materials and to provide a homogeneous and even illumination that covers the wavelengths sensitive to the hyperspectral camera. The illumination system consists of a parabolic surface that uniformly distributes the lig
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
FMARS is a large-scale dataset of Very High Resolution (VHR) remote sensing images with annotations generated using Vision Foundation Models. The dataset focuses on disaster management applications and provides pre-event imagery and annotations for major crisis events worldwide from 2021 to 2023.
MMPD Dataset is proposed in ECCV'2024 "When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset".
THRawS is a new dataset of raw Sentinel-2 (S-2) satellite data containing warm temperature hotspots such as wildfires and volcanic eruptions from around the world. The dataset aims to promote the development of energy-efficient pre-processing algorithms and AI models for onboard-satellite applications. A custom methodology was designed to identify events in raw data using corresponding Level-1C (L1C) products and a lightweight coarse coregistration and georeferencing strategy was employed to deal with unprocessed data. The dataset comprises over 100 samples, including wildfire, volcanic eruption, and event-free volcanic areas, to enable warm-events detection and general classification applications. Finally, the performances of the proposed coarse spatial coregistration technique and the SuperGlue Deep Neural Network method were compared to highlight different constraints in terms of timing and quality of spatial registration to minimize spatial displacement error for a specific scene.