TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

3,275 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

3,275 dataset results

ESP Dataset (Evaluation for Styled Prompt datase)

ESP dataset (Evaluation for Styled Prompt dataset) is a new benchmark for zero-shot domain-conditional caption generation. The dataset aims to evaluate the capability to generate diverse domain-specific language conditioned on the same image. It comprises 4.8k captions from 1k images in the COCO Captions test set. We collected five text domains with everyday usage: blog, social media, instruction, story, and news using Amazon MTurk.

0 papers0 benchmarksImages, Texts

Pathfinder-X2

Pathfinder and Pathfinder-X have proven to be instrumental in training and testing Large Language Models with long-range dependencies. Recently, Meta's Moving Average Equipped Gated Attention model scored a 97% on the Pathfinder-X dataset, indicating a need for a larger, more challenging dataset. Whereas Pathfinder-X only went up to 256 x 256 pixel images (or a sequence length of 65,536 tokens), Pathfinder-X2 introduces images of 512 x 512 pixels, or 262,144 tokens.

0 papers0 benchmarksImages

BalitaNLP

A Filipino multi-modal language dataset for text+visual tasks. Consists of 351,755 Filipino news articles gathered from Filipino news outlets.

0 papers0 benchmarksImages, Texts

X-Wines (A Wine Dataset for Recommender Systems and Machine Learning)

X-Wines is a consistent wine dataset containing 100,646 instances and 21 million real evaluations carried out by users. Data were collected on the open Web in 2022 and pre-processed for wider free use. They refer to the scale 1–5 ratings carried out over a period of 10 years (2012–2021) for wines produced in 62 different countries.

0 papers0 benchmarksImages, Ranking, Tabular, Texts, Time series

ChaBuD (Change detection for Burned area Delineation)

The dataset comprises patches of size 512x512 pixels collected from Sentinel-2 L2A satellite mission. All reported forest fires are located in California. For each area of interest, two images are provided: pre-fire acquisition and post-fire acquisition. Each image is composed of 12 different channels, collecting information from the visible spectrum, infrared and ultrablue.

0 papers0 benchmarksImages, Time series

Multi-Spectral Stereo Dataset (RGB, NIR, thermal images, LiDAR, GPS/IMU)

Abstract: We introduce the multi-spectral stereo (MS2) outdoor dataset, including stereo RGB, stereo NIR, stereo thermal, stereo LiDAR data, and GPS/IMU information. Our dataset provides rectified and synchronized 184K data pairs taken from city, residential, road, campus, and suburban areas in the morning, daytime, and nighttime under clear-sky, cloudy, and rainy conditions. We designed the dataset to explore various computer vision algorithms from multi-spectral sensor data to achieve high-level performance, reliability, and robustness against challenging environments.

0 papers0 benchmarksImages, LiDAR, Point cloud, Stereo

maadaa-FaEco Dataset (maadaa.ai Fashion & e-Commerce Open Dataset)

The dataset is organized into 24 typical scenarios, showcasing the richness of real-world environments, conditions, and objects. It is carefully curated to reflect diverse and realistic situations, allowing models to be tested and refined under a wide range of conditions.

0 papers0 benchmarksImages

MVP-24K (Multi-grained Vehicle Parsing dataset)

Multi-grained Vehicle Parsing (MVP) is a large-scale dataset for semantic analysis of vehicles in the wild, which has several featured properties. 1. The MVP contains 24,000 vehicle images captured in read-world surveillance scenes, which makes it more scalable for real applications. 2. For different requirements, we annotate the vehicle images with pixel-level part masks in two granularities, i.e., the coarse annotations of ten classes and the fine annotations of 59 classes. The former can be applied to object-level applications such as vehicle Re-Id, fine-grained classification, and pose estimation, while the latter can be explored for high-quality image generation and content manipulation. 3. The images reflect the complexity of real surveillance scenes, such as different viewpoints, illumination conditions, backgrounds, and etc. In addition, the vehicles have diverse countries, types, brands, models, and colors, which makes the dataset more diverse and challenging.

0 papers0 benchmarksImages

Im-Promptu Visual Analogy Suite

Im-Promptu Visual Analogy Suite is a meta-learning framework. Each visual analogy suite is divided into two broad kind of analogies depending on the underlying relation - Primitive and Composite tasks

0 papers0 benchmarksImages

Face dataset by Generated Photos (Face dataset for Academics by Generated Photos)

The free Face dataset made for students and teachers. It contains 10,000 photos with equal distribution of race and gender parameters, along with metadata and facial landmarks. Free to use for research with citation Photos by Generated.Photos.

0 papers0 benchmarksImages

UT-Zappos50K

UT Zappos50K (UT-Zap50K) is a large shoe dataset consisting of 50,025 catalog images collected from Zappos.com. The images are divided into 4 major categories — shoes, sandals, slippers, and boots — followed by functional types and individual brands. The shoes are centered on a white background and pictured in the same orientation for convenient analysis. This dataset is created in the context of an online shopping task, where users pay special attentions to fine-grained visual differences. For instance, it is more likely that a shopper is deciding between two pairs of similar men's running shoes instead of between a woman's high heel and a man's slipper. GIST and LAB color features are provided. In addition, each image has 8 associated meta-data (gender, materials, etc.) labels that are used to filter the shoes on Zappos.com. We introduced this dataset in the context of a pairwise comparison task, where the goal is to predict which of two images more strongly exhibits a visual attribu

0 papers0 benchmarksImages

EyePACS-light (v1) (EyePACS-AIROGS-light-v1)

This is a machine-learning-ready glaucoma dataset using a balanced subset of standardized fundus images from the Rotterdam EyePACS AIROGS train set. This dataset is split into training, validation, and test folders which contain 2500, 270, and 500 fundus images in each class respectively. Each training set has a folder for each class: referable glaucoma (RG) and non-referable glaucoma (NRG).

0 papers0 benchmarksImages, Medical

SMDG (Standardized Multi-Channel Dataset for Glaucoma)

Standardized Multi-Channel Dataset for Glaucoma (SMDG-19) is a collection and standardization of 19 public datasets, comprised of full-fundus glaucoma images, associated image metadata like, optic disc segmentation, optic cup segmentation, blood vessel segmentation, and any provided per-instance text metadata like sex and age. This dataset is the largest public repository of fundus images with glaucoma.

0 papers0 benchmarksImages, Medical, Tabular

RAISE-LPBF

Laser powder bed fusion (LBPF) is the additive manufacturing (3D printing) process for metals. RAISE-LPBF is a large dataset on the effect of laser power and laser dot speed in 316L stainless steel bulk material. Both process parameters are independently sampled for each scan line from a continuous distribution, so interactions of different parameter choices can be investigated. Process monitoring comprises on-axis high-speed (20k FPS) video. The data can be used to derive statistical properties of LPBF, as well as to build anomaly detectors.

0 papers0 benchmarksImages, Physics, Videos

Volumetric CMR Cartesian Datasets (Free-running self-gated 3D cine, 4D Flow and stress 4D Flow Undersampled Datasets)

Datasets at https://zenodo.org/record/8105485 for Motion Robust CMR Reconstruction Code in https://github.com/syedmurtazaarshad/motion-robust-CMR

0 papers0 benchmarksBiomedical, Images, MRI

ELAI-Dust Storm (ELAI Dust Storm Dataset from MODIS)

Context As mentioned in the reference paper:

0 papers0 benchmarksImages

NEMO (NEMO: A Database for Emotion Analysis Using Functional Near-Infrared Spectroscopy)

We present a dataset for the analysis of human affective states using functional near-infrared spectroscopy (fNIRS). Data were recorded from thirty-one participants who engaged in two tasks. In the emotional perception task the participants passively viewed images sampled from the standard international affective picture system database, which provided ground-truth valence and arousal annotation for the stimuli. In the affective imagery task the participants actively imagined emotional scenarios followed by rating these for subjective valence and arousal. Correlates between the fNIRS signal and the valence-arousal ratings were investigated to estimate the validity of the dataset. Source-code and summaries are provided for a processing pipeline, brain activity group analysis, and estimating baseline classification performance. For classification, prediction experiments are conducted for single-trial 4-class classification of arousal and valence as well as cross-participant classificatio

0 papers0 benchmarksImages

ManiCups

Multi-domain Image Editing Benchmark

0 papers0 benchmarksImages

HEADSET (HEADSET: Human Emotion Awareness under Partial Occlusions Multimodal DataSET)

The volumetric representation of human interactions is one of the fundamental domains in the development of immersive media productions and telecommunication applications. Particularly in the context of the rapid advancement of Extended Reality (XR) applications, this volumetric data has proven to be an essential technology for future XR elaboration. In this work, we present a new multimodal database to help advance the development of immersive technologies. Our proposed database provides ethically compliant and diverse volumetric data, in particular 27 participants displaying posed facial expressions and subtle body movements while speaking, plus 11 participants wearing head-mounted displays (HMDs). The recording system consists of a volumetric capture (VoCap) studio, including 31 synchronized modules with 62 RGB cameras and 31 depth cameras. In addition to textured meshes, point clouds, and multi-view RGB-D data, we use one Lytro Illum camera for providing light field (LF) data simul

0 papers0 benchmarks3D, 3d meshes, Audio, Images, Point cloud, RGB Video, RGB-D, Videos

withoutbg100 dataset (withoutbg100 Dataset for Image Matting)

The withoutbg100 dataset consists of 100 image and alpha matte pairs. These pairs are chosen to represent a wide range of subjects and complexities, specifically crafted to enhance and test the capabilities of image background removal algorithms. The dataset includes images with complex elements such as fur and objects with varying transparency levels, providing a substantial challenge to even advanced matting techniques.

0 papers0 benchmarksImages
PreviousPage 160 of 164Next