3,275 machine learning datasets
3,275 dataset results
RealCQA Scientific Chart Question Answering as a Test-bed for First-Order Logic
ARCADE: Automatic Region-based Coronary Artery Disease diagnostics using x-ray angiography imagEs Dataset Phase 2 consist of two folders with 300 images in each of them as well as annotations.
ShapeTalk contains over half a million discriminative utterances produced by contrasting the shapes of common 3D objects for a variety of object classes and degrees of similarity. The dataset provides discriminative utterances for a total of 36,391 shapes, across 30 object classes. Overall, ShapeTalk contains 73,799 distinct contexts, and a total of 536,596 utterances
WebLINX is a large-scale benchmark of 100K interactions across 2300 expert demonstrations of conversational web navigation. It covers a broad range of patterns on over 150 real-world websites and can be used to train and evaluate agents in diverse scenarios.
Enhancing the robustness of vision algorithms in real-world scenarios is challenging. One reason is that existing robustness benchmarks are limited, as they either rely on synthetic data or ignore the effects of individual nuisance factors. We introduce OOD-CV, a benchmark dataset that includes out-of-distribution examples of 10 object categories in terms of pose, shape, texture, context and the weather conditions, and enables benchmarking models for image classification, object detection, and 3D pose estimation. In addition to this novel dataset, we contribute extensive experiments using popular baseline methods, which reveal that: 1. Some nuisance factors have a much stronger negative effect on the performance compared to others, also depending on the vision task. 2. Current approaches to enhance robustness have only marginal effects, and can even reduce robustness. 3. We do not observe significant differences between convolutional and transformer architectures. We believe our datase
Earth vision research typically focuses on extracting geospatial object locations and categories but neglects the exploration of relations between objects and comprehensive reasoning. Based on city planning needs, we develop a multi-modal multi-task VQA dataset (EarthVQA) to advance relational reasoning-based judging, counting, and comprehensive analysis. The EarthVQA dataset contains 6000 images, corresponding semantic masks, and 208,593 QA pairs with urban and rural governance requirements embedded.
Drone Surveillance of Faces, is a large-scale drone dataset intended to facilitate research for face recognition using drones.
IndustReal is an ego-centric, multi-modal dataset where 27 participants are challenged to perform assembly and maintenance procedures on a construction-toy car. The dataset is annotated for action recognition, assembly state detection, and procedure step recognition. IndustReal includes 38 execution errors in a total of 84 videos, with 14 exclusive to validation and test sets and therefore suitable for testing the robustness of algorithms against unseen errors in procedural tasks. IndustReal offers open-source 3D models for all parts to promote the use of synthetic data for scalable approaches on this dataset, as well as reproducibility. All assembly parts used in the dataset are 3D printed. This ensures reproducibility and future availability of the model and allows for growth via community effort.
Since H36M is captured in a controlled environment, it rarely depicts challenging real-world scenarios such as body occlusions that are the main source of ambiguity in the single-view 3D shape estimation problem. Hence, we construct an adapted version of H36M with synthetically-generated occlusions by randomly hiding a subset of the 2D keypoints and re-computing an image crop around the remaining visible joints.
This dataset arises from the READ project (Horizon 2020).
Floods are among the most common and devastating natural hazards, imposing immense costs on our society and economy due to their disastrous consequences. Recent progress in weather prediction and spaceborne flood mapping demonstrated the feasibility of anticipating extreme events and reliably detecting their catas- trophic effects afterwards. However, these efforts are rarely linked to one another and there is a critical lack of datasets and benchmarks to enable the direct forecast- ing of flood extent. To resolve this issue, we curate a novel dataset enabling a timely prediction of flood extent. Furthermore, we provide a representative evaluation of state-of-the-art methods, structured into two benchmark tracks for forecasting flood inundation maps i) in general and ii) focused on coastal regions. Altogether, our dataset and benchmark provide a comprehensive platform for evaluating flood forecasts, enabling future solutions for this critical challenge. Data, code & models are shared a
The ShipSG dataset is the first public dataset of its kind for ship segmentation and georeferencing. The dataset has been made available for the development and evaluation of instance segmentation and georeferencing methods using computer vision and deep learning, thus advancing the research field of ship recognition for maritime situational awareness. ShipSG consists of 3505 images of a port location using two different cameras with static and oblique view. In total, 11625 ship masks grouped in seven ship classes were manually annotated. Moreover, each image contains one geographic annotation of one of the ship masks (latitude and longitude) present in the image.
Ultra-high definition benchmark (UHDBench) includes 2293 images at 2k resolution sourced from the ground-truth test sets of HRSOD, LIU4k, UAVid, UHDM, and UHRSD. Images smaller than 2560 × 1440 are excluded.
This dataset is a new benchmark, grounded in real-world usages is developed to support more authentic and comprehensive evaluation of image editing models.
A large, realistic multimodal dataset consisting of real personal photos and crowd-sourced questions/answers.
Thyroid is a dataset for detection of thyroid diseases, in which patients diagnosed with hypothyroid or subnormal are anomalies against normal patients. It contains 2800 training data instance and 972 test instances, with 29 or so attributes.
A clickthrough prediction dataset, for more information please see the Kaggle page
The RITE (Retinal Images vessel Tree Extraction) is a database that enables comparative studies on segmentation or classification of arteries and veins on retinal fundus images, which is established based on the public available DRIVE database (Digital Retinal Images for Vessel Extraction).
The Middlebury 2006 is a stereo dataset of indoor scenes with multiple handcrafted layouts.
The Market1501-Attributes dataset is built from the Market1501 dataset. Market1501 Attribute is an augmentation of this dataset with 28 hand annotated attributes, such as gender, age, sleeve length, flags for items carried as well as upper clothes colors and lower clothes colors.