Datasets

3,275 machine learning datasets

3,275 dataset results

DIS100k

This is an image splicing dataset including different types of preprocessing and postprocessing techniques. Foreground objects are taken from HRSOD and background images are taken from BG20k datasets. 95000 train and 5000 test images are provided.

0 papers0 benchmarksImages

SignboardText

This dataset contains images and annotations for scene text detection and recognition. It is made up of two parts: (1) 1,175 images manually labeled with a total of 59,588 text instances at the line and word levels; and (2) 929 signboard images collected from the VinText, Total-Text, and ICDAR15 datasets. Each text instance in the first part of our dataset has a quadrilateral bounding box and a ground truth character sequence associated with it. In the second part, images are selected if they contain signboards. This portion of the dataset comprises 20,261 text instances at word levels. This brings the total text instances in our final dataset up to 79,814. Following the ICDAR15 standard, we annotated each image with all of the text instances, polygons, and content that were present. Manual annotations were done on each and every image.

0 papers0 benchmarksImages

GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation

GenAI-Bench benchmark consists of 1,600 challenging real-world text prompts sourced from professional designers. Compared to benchmarks such as PartiPrompt and T2I-CompBench, GenAI-Bench captures a wider range of aspects in the compositional text-to-visual generation, ranging from basic (scene, attribute, relation) to advanced (counting, comparison, differentiation, logic). GenAI-Bench benchmark also collects human alignment ratings (1-to-5 Likert scales) on images and videos generated by ten leading models, such as Stable Diffusion, DALL-E 3, Midjourney v6, Pika v1, and Gen2.

0 papers0 benchmarksImages, Texts, Videos

Multimodal Needle in a Haystack (MMNeedle)

Multimodal Large Language Models (MLLMs) have shown significant promise in various applications, leading to broad interest from researchers and practitioners alike. However, a comprehensive evaluation of their long-context capabilities remains underexplored. To address these gaps, we introduce the MultiModal Needle-in-a-haystack (MMNeedle) benchmark, specifically designed to assess the long-context capabilities of MLLMs. Besides multi-image input, we employ image stitching to further increase the input context length, and develop a protocol to automatically generate labels for sub-image level retrieval. Essentially, MMNeedle evaluates MLLMs by stress-testing their capability to locate a target sub-image (needle) within a set of images (haystack) based on textual instructions and descriptions of image contents. This setup necessitates an advanced understanding of extensive visual contexts and effective information retrieval within long-context image inputs. With this benchmark, we evalu

0 papers0 benchmarksImages, Texts

Temporal Logic Video (TLV) Dataset

The Temporal Logic Video (TLV) Dataset addresses the scarcity of state-of-the-art video datasets for long-horizon, temporally extended activity and object detection. It comprises two main components:

0 papers0 benchmarksImages, Videos

MS-EVS Dataset (Multispectral Event-based Face detection dataset)

The MS-EVS Dataset is the first large-scale event-based dataset for face detection.

0 papers0 benchmarksHyperspectral images, Images, Videos

Lusitano Fabric Defect Detection Dataset

The Lusitano dataset was collected over a 3-month period, spanning from January to March, from Paulo de Oliveira, S.A., a prominent textile company, based in Covilhã, Portugal, renowned for its innovative contributions to the textile industry. To collect the images for the dataset, we placed one camera in front of a fabric inspection machine, along with a strong and nearly uniform light source. This dataset comprises 4096 × 1024 images, captured by an industrial-grade Teledyne Dalsa Linea camera. The camera’s high resolution and precision ensure the accurate depiction of textile samples, with the level of detail necessary for defect analysis. None of the defects depicted in this dataset are artificially generated; they stem from genuine occurrences observed during this collection period, and thus represent real-world challenges encountered in textile production processes. The dataset also showcases normal images. We announce two folders, train and test in the same folder architecture,

0 papers0 benchmarksImages

AntM2C (Ant-Group Multi-Scenario Multi-Modal CTR dataset)

We release a large-scale Multi-Scenario Multi-Modal CTR dataset named AntM2C, built from real industrial data from Alipay. This dataset offers an impressive breadth and depth of information, covering CTR data from four diverse business scenarios, including advertisements, consumer coupons, mini-programs, and videos. Unlike existing datasets, AntM2C provides not only ID-based features but also five textual features and one image feature for both users and items, supporting more delicate multi-modal CTR prediction.

0 papers0 benchmarksImages, Tabular, Texts

Digital Forensics 2023 dataset - DF2023

The deliberate manipulation of public opinion, especially through altered images, poses a significant danger to society. To fight this issue on a technical level we support the research community by releasing the Digital Forensics 2023 (DF2023) training and validation dataset.

0 papers0 benchmarksImages

Low Light Dataset (Dataset with ill-lighting conditions DILCOD)

Introduced by Khan. et al. Divide and conquer: Ill-light image enhancement via hybrid deep network https://www.sciencedirect.com/science/article/abs/pii/S0957417421004759

0 papers0 benchmarksImages

Cloudy Day Crossroad Dash Cam Video Dataset

Key Points

0 papers0 benchmarksImages

InpaintCOCO

InpaintCOCO is a benchmark to understand fine-grained concepts in multimodal models (vision-language) similar to Winoground. To our knowledge InpaintCOCO is the first benchmark, which consists of image pairs with minimum differences, so that the visual representation can be analyzed in a more standardized setting.

0 papers0 benchmarksImages, Texts

CASIA-CXR

Medical report generation (MRG), which aims to automatically generate a textual description of a specific medical image (e.g., a chest X-ray), has recently received increasing research interest. Building on the success of image captioning, MRG has become achievable. However, generating language-specific radiology reports poses a challenge for data-driven models due to their reliance on paired image-report chest X-ray datasets, which are labor-intensive, time-consuming, and costly. In this paper, we introduce a chest X-ray benchmark dataset, namely CASIA-CXR, consisting of high-resolution chest radiographs accompanied by narrative reports originally written in French. To the best of our knowledge, this is the first public chest radiograph dataset with medical reports in this particular language. Importantly, we propose a simple yet effective multimodal encoder-decoder contextually-guided framework for medical report generation in French. We validated our framework through intra-language

0 papers0 benchmarksImages, Medical, Texts

IITKGP_Fence Dataset

Overview The IITKGP_Fence dataset is designed for tasks related to fence-like occlusion detection, defocus blur, depth mapping, and object segmentation. The captured data vaies in scene composition, background defocus, and object occlusions. The dataset comprises both labeled and unlabeled data, as well as additional video and RGB-D data. The contains ground truth occlusion masks (GT) for the corresponding images. We created the ground truth occlusion labels in a semi-automatic way with user interaction.

0 papers0 benchmarksImages, RGB Video, RGB-D

PSIE

Post-Spraying Image Evaluation This dataset is for the paper Deep Learning for Precision Agriculture: Post-Spraying Evaluation and Deposition Estimation (https://arxiv.org/abs/2409.16213).

0 papers0 benchmarksImages

CodeSCAN (ScreenCast ANalysis for Video Programming Tutorials)

CodeSCAN is the first large-scale and diverse dataset of coding screenshots with pixel-perfect annotations. It features:

0 papers0 benchmarksImages, Texts

SimNICT

SimNICT is the first dataset for training universal non-ideal measurement CT (NICT) enhancement models.

0 papers0 benchmarksImages, Medical

VolleyVision (Volleyball Detection Dataset)

This dataset is part of my bachelor thesis project. It was created by combining multiple open-source datasets from RoboFlow Universe as well as manual annotation.

0 papers0 benchmarksImages

LoLI-Street (Low-Light Images of Streets)

We introduce low-light image enhancement benchmark dataset “Low-light Images of Streets (LoLI-Street),” which contains three subsets: train, validation, and test. The train and validation sets consist of 30k and 3k paired low-light and high-light images, respectively, and the real low-light test set (RLLT) contains 1k images under real-world low-light conditions, totaling 33k images.

0 papers0 benchmarksImages

Media-Text (MediaText: a media industry-based dataset for scene text detetcion)

Media-Text dataset comprising images of banners, posters, covers and another images characterised for media industry.

0 papers0 benchmarksImages

PreviousPage 162 of 164Next