3,275 machine learning datasets
3,275 dataset results
ESP dataset (Evaluation for Styled Prompt dataset) is a new benchmark for zero-shot domain-conditional caption generation. The dataset aims to evaluate the capability to generate diverse domain-specific language conditioned on the same image. It comprises 4.8k captions from 1k images in the COCO Captions test set. We collected five text domains with everyday usage: blog, social media, instruction, story, and news using Amazon MTurk.
Pathfinder and Pathfinder-X have proven to be instrumental in training and testing Large Language Models with long-range dependencies. Recently, Meta's Moving Average Equipped Gated Attention model scored a 97% on the Pathfinder-X dataset, indicating a need for a larger, more challenging dataset. Whereas Pathfinder-X only went up to 256 x 256 pixel images (or a sequence length of 65,536 tokens), Pathfinder-X2 introduces images of 512 x 512 pixels, or 262,144 tokens.
A Filipino multi-modal language dataset for text+visual tasks. Consists of 351,755 Filipino news articles gathered from Filipino news outlets.
X-Wines is a consistent wine dataset containing 100,646 instances and 21 million real evaluations carried out by users. Data were collected on the open Web in 2022 and pre-processed for wider free use. They refer to the scale 1–5 ratings carried out over a period of 10 years (2012–2021) for wines produced in 62 different countries.
The dataset comprises patches of size 512x512 pixels collected from Sentinel-2 L2A satellite mission. All reported forest fires are located in California. For each area of interest, two images are provided: pre-fire acquisition and post-fire acquisition. Each image is composed of 12 different channels, collecting information from the visible spectrum, infrared and ultrablue.
Abstract: We introduce the multi-spectral stereo (MS2) outdoor dataset, including stereo RGB, stereo NIR, stereo thermal, stereo LiDAR data, and GPS/IMU information. Our dataset provides rectified and synchronized 184K data pairs taken from city, residential, road, campus, and suburban areas in the morning, daytime, and nighttime under clear-sky, cloudy, and rainy conditions. We designed the dataset to explore various computer vision algorithms from multi-spectral sensor data to achieve high-level performance, reliability, and robustness against challenging environments.
The dataset is organized into 24 typical scenarios, showcasing the richness of real-world environments, conditions, and objects. It is carefully curated to reflect diverse and realistic situations, allowing models to be tested and refined under a wide range of conditions.
Multi-grained Vehicle Parsing (MVP) is a large-scale dataset for semantic analysis of vehicles in the wild, which has several featured properties. 1. The MVP contains 24,000 vehicle images captured in read-world surveillance scenes, which makes it more scalable for real applications. 2. For different requirements, we annotate the vehicle images with pixel-level part masks in two granularities, i.e., the coarse annotations of ten classes and the fine annotations of 59 classes. The former can be applied to object-level applications such as vehicle Re-Id, fine-grained classification, and pose estimation, while the latter can be explored for high-quality image generation and content manipulation. 3. The images reflect the complexity of real surveillance scenes, such as different viewpoints, illumination conditions, backgrounds, and etc. In addition, the vehicles have diverse countries, types, brands, models, and colors, which makes the dataset more diverse and challenging.
Im-Promptu Visual Analogy Suite is a meta-learning framework. Each visual analogy suite is divided into two broad kind of analogies depending on the underlying relation - Primitive and Composite tasks
The free Face dataset made for students and teachers. It contains 10,000 photos with equal distribution of race and gender parameters, along with metadata and facial landmarks. Free to use for research with citation Photos by Generated.Photos.
UT Zappos50K (UT-Zap50K) is a large shoe dataset consisting of 50,025 catalog images collected from Zappos.com. The images are divided into 4 major categories — shoes, sandals, slippers, and boots — followed by functional types and individual brands. The shoes are centered on a white background and pictured in the same orientation for convenient analysis. This dataset is created in the context of an online shopping task, where users pay special attentions to fine-grained visual differences. For instance, it is more likely that a shopper is deciding between two pairs of similar men's running shoes instead of between a woman's high heel and a man's slipper. GIST and LAB color features are provided. In addition, each image has 8 associated meta-data (gender, materials, etc.) labels that are used to filter the shoes on Zappos.com. We introduced this dataset in the context of a pairwise comparison task, where the goal is to predict which of two images more strongly exhibits a visual attribu
This is a machine-learning-ready glaucoma dataset using a balanced subset of standardized fundus images from the Rotterdam EyePACS AIROGS train set. This dataset is split into training, validation, and test folders which contain 2500, 270, and 500 fundus images in each class respectively. Each training set has a folder for each class: referable glaucoma (RG) and non-referable glaucoma (NRG).
Standardized Multi-Channel Dataset for Glaucoma (SMDG-19) is a collection and standardization of 19 public datasets, comprised of full-fundus glaucoma images, associated image metadata like, optic disc segmentation, optic cup segmentation, blood vessel segmentation, and any provided per-instance text metadata like sex and age. This dataset is the largest public repository of fundus images with glaucoma.
Laser powder bed fusion (LBPF) is the additive manufacturing (3D printing) process for metals. RAISE-LPBF is a large dataset on the effect of laser power and laser dot speed in 316L stainless steel bulk material. Both process parameters are independently sampled for each scan line from a continuous distribution, so interactions of different parameter choices can be investigated. Process monitoring comprises on-axis high-speed (20k FPS) video. The data can be used to derive statistical properties of LPBF, as well as to build anomaly detectors.
Datasets at https://zenodo.org/record/8105485 for Motion Robust CMR Reconstruction Code in https://github.com/syedmurtazaarshad/motion-robust-CMR
Context As mentioned in the reference paper:
We present a dataset for the analysis of human affective states using functional near-infrared spectroscopy (fNIRS). Data were recorded from thirty-one participants who engaged in two tasks. In the emotional perception task the participants passively viewed images sampled from the standard international affective picture system database, which provided ground-truth valence and arousal annotation for the stimuli. In the affective imagery task the participants actively imagined emotional scenarios followed by rating these for subjective valence and arousal. Correlates between the fNIRS signal and the valence-arousal ratings were investigated to estimate the validity of the dataset. Source-code and summaries are provided for a processing pipeline, brain activity group analysis, and estimating baseline classification performance. For classification, prediction experiments are conducted for single-trial 4-class classification of arousal and valence as well as cross-participant classificatio
Multi-domain Image Editing Benchmark
The volumetric representation of human interactions is one of the fundamental domains in the development of immersive media productions and telecommunication applications. Particularly in the context of the rapid advancement of Extended Reality (XR) applications, this volumetric data has proven to be an essential technology for future XR elaboration. In this work, we present a new multimodal database to help advance the development of immersive technologies. Our proposed database provides ethically compliant and diverse volumetric data, in particular 27 participants displaying posed facial expressions and subtle body movements while speaking, plus 11 participants wearing head-mounted displays (HMDs). The recording system consists of a volumetric capture (VoCap) studio, including 31 synchronized modules with 62 RGB cameras and 31 depth cameras. In addition to textured meshes, point clouds, and multi-view RGB-D data, we use one Lytro Illum camera for providing light field (LF) data simul
The withoutbg100 dataset consists of 100 image and alpha matte pairs. These pairs are chosen to represent a wide range of subjects and complexities, specifically crafted to enhance and test the capabilities of image background removal algorithms. The dataset includes images with complex elements such as fur and objects with varying transparency levels, providing a substantial challenge to even advanced matting techniques.