3,275 machine learning datasets
3,275 dataset results
Large multimodal models extend the impressive capabilities of large language models by integrating multimodal understanding abilities. However, it is not clear how they can emulate the general intelligence and reasoning ability of humans. As recognizing patterns and abstracting concepts are key to general intelligence, we introduce PuzzleVQA, a collection of puzzles based on abstract patterns. With this dataset, we evaluate large multimodal models with abstract patterns based on fundamental concepts, including colors, numbers, sizes, and shapes. Through our experiments on state-of-the-art large multimodal models, we find that they are not able to generalize well to simple abstract patterns. Notably, even GPT-4V cannot solve more than half of the puzzles. To diagnose the reasoning challenges in large multimodal models, we progressively guide the models with our ground truth reasoning explanations for visual perception, inductive reasoning, and deductive reasoning. Our systematic analysi
Please refer: https://github.com/google/imageinwords/blob/main/datasets/IIW-400/README.md
We sample 2025 frames of images from the original KITTI for Mono3DRefer, containing 41,140 expressions in total and a vocabulary of 5,271 words.
MULTI-Benchmark is a cutting-edge benchmark for evaluating Multimodal Large Language Models (MLLMs). It is designed to test the understanding of complex tables and images, and reasoning with long context¹. Here are some key features of MULTI-Benchmark:
MUTE This is the first open-source Bengali Hateful Meme dataset, consisting of around 4200 memes annotated with two labels: hate and not hate.
AnoVox is a large-scale benchmark for ANOmaly detection in autonomous driving. AnoVox incorporates multimodal sensor data and spatial VOXel ground truth, allowing for the comparison of methods independent of their used sensor. AnoVox contains both content and temporal anomalies.
A screenshot-caption dataset containing 135k pairs of screenshots and captions extracted from Google Play.
WFDD is a dataset for benchmarking anomaly detection methods with a focus on textile inspection. It includes 4101 woven fabric images categorized into 4 categories: grey cloth, grid cloth, yellow cloth, and pink flower. The first three classes are collected from the industrial production sites of WEIQIAO Textile, while the 'pink flower' class is gathered from the publicly available Cloth Flaw Dataset. Each category contains block-shape, point-like, and line-type defects with pixel-level annotations.
The Leuven-Haifa dataset contains 240 disc-centered fundus images of 224 unique patients (75 patients with normal tension glaucoma, 63 patients with high tension glaucoma, 30 patients with other eye diseases and 56 healthy controls) from the University Hospitals of Leuven. The arterioles and venules of these images were both annotated by master students in medicine and corrected by a senior annotator. All senior segmentation corrections are provided as well as the junior segmentations of the test set. An open-source toolbox for the parametrization of segmentations was developed. Diagnosis, age, sex, vascular parameters as well as a quality score are provided as metadata. Potential reuse is envisioned as the development or external validation of blood vessels segmentation algorithms or study of the vasculature in glaucoma and the development of glaucoma diagnosis algorithms. The dataset is available on the KU Leuven Research Data Repository (RDR).
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
Given the unavailability of real-world pharmaceutical inspection-domain datasets, we have created the Sensum Solid Oral Dosage Forms (SensumSODF) dataset intended for research and evaluation purposes.
Tiny ImageNetv2 is a subset of the ImageNetV2 (matched frequency) dataset by Recht et al. ("Do ImageNet Classifiers Generalize to ImageNet?") with 2,000 images spanning all 200 classes of the Tiny ImageNet dataset. It is a test set achieved by collecting images of joint classes of Tiny ImageNet and ImageNet. The resized images of size 64×64 consist of images collected from Flickr after a decade of progress on the original ImageNet dataset. The data collection process was designed to resemble the original ImageNet dataset distribution. For further information on ImageNetV2 visit the original GitHub repository of ImageNetV2.
The Kvasir-VQA dataset is an extended dataset derived from the HyperKvasir and Kvasir-Instrument datasets, augmented with question-and-answer annotations. This dataset is designed to facilitate advanced machine learning tasks in gastrointestinal (GI) diagnostics, including image captioning, Visual Question Answering (VQA) and text-based generation of synthetic medical images.
A benchmark designed to evaluate MLLMs’ proficiency in understanding inter-object relationships and textual content.
This dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs).
FindingEmo is an image dataset containing annotations for 25k images, specifically tailored to Emotion Recognition. Contrary to existing datasets, it focuses on complex scenes depicting multiple people in various naturalistic, social settings, with images being annotated as a whole, thereby going beyond the traditional focus on faces or single individuals. Annotated dimensions include Valence, Arousal and Emotion label, with annotations gathered using Prolific. Together with the annotations, we release the list of URLs pointing to the original images, as well as all associated source code.
EC-FUNSD is introduced in [arXiv:2402.02379] as a benchmark of semantic entity recognition (SER) and entity linking (EL), designed for the entity-centric robustness evaluation of pre-trained text-and-layout models (PTLMs).
ROOR is a reading order prediction (ROP) benchmark which annotates layout reading order as ordering relations.
OAM-TCD is a dataset of around 5k aerial images from around the world to support robust tree detection algorithms. Full details can be found on the linked HuggingFace repository.
The human brain receives nutrients and oxygen through an intricate network of blood vessels. Pathology affecting small vessels, at the mesoscopic scale, represents a critical vulnerability within the cerebral blood supply and can lead to severe conditions, such as Cerebral Small Vessel Diseases. The advent of 7 Tesla MRI systems has enabled the acquisition of higher spatial resolution images, making it possible to visualise such vessels in the brain. However, the lack of publicly available annotated datasets has impeded the development of robust, machine learning-driven segmentation algorithms. To address this, the SMILE-UHURA challenge was organised. This challenge, held in conjunction with the ISBI 2023, in Cartagena de Indias, Colombia, aimed to provide a platform for researchers working on related topics. The SMILE-UHURA challenge addresses the gap in publicly available annotated datasets by providing an annotated dataset of Time-of-Flight angiography acquired with 7T MRI. This dat