3,275 machine learning datasets
3,275 dataset results
MagicBathyNet is a benchmark dataset made up of image patches of Sentinel-2, SPOT-6 and aerial imagery, bathymetry in raster format and seabed classes annotations. Dataset also facilitates unsupervised learning for model pre-training in shallow coastal areas.
The ULS23 test set contains 725 lesions from 284 patients of the Radboudumc and JBZ hospitals in the Netherlands. It is intended to be used to measure the performance of 3D universal lesion segmentation models for Computed Tomography (CT). To prepare the data, radiological reports from both participating institutions where searched using NLP tools identifying patients with measurable target lesions, indicating that these lesions were clinically relevant. A random sample of patients was selected, 56.3% of which were male and with diverse scanner manufacturers. The lesions were annotated in 3D by expert radiologists with over 10 years of experience in reading oncological scans. ULS23 is an open benchmark, and we invite ongoing submissions to advance the development of future ULS models.
The Calandra dataset provides the data from a pair of tactile sensors attached to a jaw gripper (left and right) alongside the RGB images. A triplet of samples was captured ’before’, ’during’, and ’after’ grasping a plethora of objects. The objective is to determine the success or the failure of the grasp attempt.
Background: Lung cancer risk classification is an increasingly important area of research as low-dose thoracic CT screening programs have become standard of care for patients at high risk for lung cancer. There is limited availability of large, annotated public databases for the training and testing of algorithms for lung nodule classification.
The dataset was created to address the crucial need for effective Extreme Weather Events Detection (EWED), an increasingly urgent task due to the rising frequency of such events driven by global warming. Traditional methods for EWED rely on numerical threshold setting and the analysis of weather anomaly heatmaps, visualizing data such as temperature, wind speed, and precipitation. However, these methods often involve manual work and can be time-consuming and error-prone. While advances in AI have led to the development of machine learning models like Convolutional Neural Networks (CNNs) for weather prediction and EWED, these models predominantly use numeric data and often yield low accuracy. Moreover, despite the proficiency of Large Language Models (LLMs) in generating textual weather reports, they struggle with interpreting visual data—crucial for EWED. General Vision-Language Models (VLMs) also face challenges in accurately interpreting meteorological heatmaps, commonly misidentifyi
LUMA is a multimodal dataset that consists of audio, image, and text modalities. It allows controlled injection of uncertainties into the data and is mainly intended for studying uncertainty quantification in multimodal classification settings. This repository provides the Audio and Text modalities. The image modality consists of images from CIFAR-10/100 datasets. To download the image modality and compile the dataset with a specified amount of uncertainties, please use the LUMA compilation tool.
The ARC-AGI benchmark is a significant measure in the field of artificial intelligence, focusing on an AI's general reasoning capabilities. Recently, there has been a notable achievement where GPT-4o reached a 50% score on the ARC-AGI benchmark, surpassing the previous best score of 34%. This benchmark involves several examples and problems that require the system to infer rules and output correct results corresponding to the problem diagram.
VD4UAV is an altitude-sensitive benchmark dataset designed to evade vehicle detection in Unmanned Aerial Vehicle (UAV) imagery. This dataset is specifically curated to facilitate the study of adversarial patch-based vehicle detection attacks in UAV images. The EVD4UAV dataset comprises a diverse set of images captured at various altitudes with fine-grained annotations, making it a robust platform for evaluating the performance of object detectors under adversarial conditions. Notably, the dataset includes around 3,000 images depicting winter scenarios where vehicles may be partially or fully covered by snow, providing a unique challenge for vehicle detection algorithms.
Multi-modal sarcasm detection has attracted much recent attention. Nevertheless, the existing benchmark (MMSD) has some shortcomings that hinder the development of reliable multi-modal sarcasm detection system:(1) There are some spurious cues in MMSD, leading to the model bias learning; (2) The negative samples in MMSD are not always reasonable.To solve the aforementioned issues, we introduce MMSD2.0, a correction dataset that fixes the shortcomings of MMSD, by removing the spurious cues and re-annotating the unreasonable samples.Meanwhile, we present a novel framework called multi-view CLIP that is capable of leveraging multi-grained cues from multiple perspectives (i.e., text, image, and text-image interaction view) for multi-modal sarcasm detection.Extensive experiments show that MMSD2.0 is a valuable benchmark for building reliable multi-modal sarcasm detection systems and multi-view CLIP can significantly outperform the previous best baselines (with a 5.6% improvement).
The HOI-Synth benchmark extends three egocentric datasets designed to study hand-object interaction detection, EPIC-KITCHENS VISOR, EgoHOS, and ENIGMA-51, with automatically labeled synthetic data obtained through a novel HOI generation pipeline.
GVLQA is the first vision-language QA dataset for general graph reasoning. Contains a base set GVLQA-BASE and four image-augmented subsets GVLQA-AUGLY, GVLQA-AUGNO, GVLQA-AUGNS, GVLQA-AUGET, where the samples are relatively corresponding with the base set. Contains 7 graph reasoning tasks: detecting cycle, connectivity, computing topological ordering, shortest path, maximum flow, bipartite matching num, and Hamilton path. Utility: 1) evaluate the graph reasoning capabilities of VLMs or LLMs; 2) help models acquire fundamental graph comprehension and reasoning abilities as a pretraining dataset.
Large-scale benchmark dataset of full-field digital mammography, called VinDr-Mammo, which consists of 5,000 four-view exams with breast-level assessment and finding annotations. Each of these exams was independently double read, with discordance (if any) being resolved by arbitration by a third radiologist.
Existing raindrop removal datasets have two shortcomings. First, they consist of images captured by cameras with a focus on the background, leading to the presence of blurry raindrops. To our knowledge, none of these datasets include images where the focus is specifically on raindrops, which results in a blurry background. Second, these datasets predominantly consist of daytime images, thereby lacking nighttime raindrop scenarios. Consequently, algorithms trained on these datasets may struggle to perform effectively in raindrop-focused or nighttime scenarios. The absence of datasets specifically designed for raindrop-focused and nighttime raindrops constrains research in this area. In this paper, we introduce a large-scale, real-world raindrop removal dataset called Raindrop Clarity. Raindrop Clarity comprises 15,186 high-quality pairs/triplets (raindrops, blur, and background) of images with raindrops and the corresponding clear background images. There are 5,442 daytime raindrop imag
Noise of Web (NoW) is a challenging noisy correspondence learning (NCL) benchmark for robust image-text matching/retrieval models. It contains 100K image-text pairs consisting of website pages and multilingual website meta-descriptions (98,000 pairs for training, 1,000 for validation, and 1,000 for testing). NoW has two main characteristics: without human annotations and the noisy pairs are naturally captured. The source image data of NoW is obtained by taking screenshots when accessing web pages on mobile user interface (MUI) with 720 $\times$ 1280 resolution, and we parse the meta-description field in the HTML source code as the captions. In NCR (predecessor of NCL), each image in all datasets were preprocessed using Faster-RCNN detector provided by Bottom-up Attention Model to generate 36 region proposals, and each proposal was encoded as a 2048-dimensional feature. Thus, following NCR, we release our the features instead of raw images for fair comparison. However, we can not just
Tiny ImageNet-R is a subset of the ImageNet-R dataset by Hendrycks et al. ("The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization") with 10,456 images spanning 62 of the 200 Tiny ImageNet dataset. It is a test set achieved by collecting images of joint classes of Tiny ImageNet and ImageNet. The resized images of size 64×64 contain art, cartoons, deviantart, graffiti, embroidery, graphics, origami, paintings, patterns, plastic objects, plush objects, sculptures, sketches, tattoos, toys, and video game renditions of ImageNet classes. For further information on ImageNet-R visit the original GitHub repository of ImageNet-R.
MedMNIST-C is an open-source data set collection comprising algorithmically generated corruptions applied to the test sets of the MedMNIST collection following the concept of ImageNet-C. To maintain the integrity of the medical data, we have excluded any weather-dependent corruptions (“Snow”, “Frost”, “Fog”). Hence, each data set in the MedMNIST-C collection comprises 16 different corruptions (12 test corruptions and 4 validation corruptions) spanning 5 severity levels. For further information on the corruptions visit the original GitHub repository of ImageNet-C.
The ARKitFace dataset is established by this work in order to train and evaluate both 3D face shape and 6DoF in the setting of perspective projection. A total of 500 volunteers, aged 9 to 60, are invited to record the dataset. They sit in a random environment, and the 3D acquisition equipment is fixed in front of them, with a distance ranging from about 0.3m to 0.9m. Each subject is asked to perform 33 specific expressions with two head movements (from looking left to looking right / from looking up to looking down). 3D acquisition equipment we used is an iPhone 11. The shape and location of human face are tracked by structured light sensor. The triangle mesh and 6DoF information of the RGB images are obtained by built-in ARKit toolbox. The triangle mesh is made up of 1,220 vertices and 2,304 triangles. In total, 902,724 2D facial images (resolution 1280×720 or 1440×1280) with ground-truth 3D mesh and 6DoF pose annotation are collected.
SA-Det-100k is a large-scale class-agnostic object detection dataset for Research Purposes only. The dataset is based on a subset of SA-1B (see LICENSE), and all objects belong to the same category objects. Because it contains a large number of scenarios but does not provide class-specific annotations, we believe it may be a good choice to pre-training models for a variety of downstream tasks with different categories. The dataset contains about 100k images, and each image is resized using opencv-python so that the larger one of their height and width is 1333, which is consistent with the data augmentation commonly used to train COCO. For example project based on this dataset, please see Relation-DETR (https://github.com/xiuqhou/Relation-DETR).
The availability of high-quality datasets play a crucial role in advancing research and development especially, for safety critical and autonomous systems. In this paper, we present AssistTaxi, a comprehensive novel dataset which is a collection of images for runway and taxiway analysis. The dataset comprises of more than 300,000 frames of diverse and carefully collected data, gathered from Melbourne (MLB) and Grant-Valkaria (X59) general aviation airports. The importance of AssistTaxi lies in its potential to advance autonomous operations, enabling researchers and developers to train and evaluate algorithms for efficient and safe taxiing.
The Industrial Objects in Varied Contexts (InVar) Dataset was internally produced by our team and contains 100 objects in 20800 total images (208 images per class). The objects consist of common automotive, machine and robotics lab parts. Each class contains 4 sub-categories (52 images each) with different attributes and visual complexities.