TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

PIE-Bench (Prompt-based Image Editing Benchmark)

PIE-Bench comprises 700 images featuring 10 distinct editing types. Images are evenly distributed in natural and artificial scenes (e.g., paintings) among four categories: animal, human, indoor, and outdoor. Each image in PIE-Bench includes five annotations: source image prompt, target image prompt, editing instruction, main editing body, and the editing mask. Notably, the editing mask annotation (indicating the anticipated editing region) is crucial in accurate metrics computations as we expect the editing to only occur within a designated area.

23 papers16 benchmarksImages, Texts

DSEC (A Stereo Event Camera Dataset for Driving Scenarios)

DSEC is a stereo camera dataset in driving scenarios that contains data from two monochrome event cameras and two global shutter color cameras in favorable and challenging illumination conditions. In addition, we collect Lidar data and RTK GPS measurements, both hardware synchronized with all camera data. One of the distinctive features of this dataset is the inclusion of VGA-resolution event cameras. Event cameras have received increasing attention for their high temporal resolution and high dynamic range performance. However, due to their novelty, event camera datasets in driving scenarios are rare. This work presents the first high-resolution, large-scale stereo dataset with event cameras.

23 papers7 benchmarks

Real 3D-AD

Real 3D-AD is the first point cloud anomaly detection dataset for industrial products. Real3D-AD comprises a total of 1,254 samples that are distributed across 12 distinct categories. These categories include Airplane, Car, Candybar, Chicken, Diamond, Duck, Fish, Gemstone, Seahorse, Shell, Starfish, and Toffees. Each training sample is an absence of blind spots, and a realistic, high-accuracy prototype.

23 papers7 benchmarks3D, Point cloud

ISNotes

The ISNotes dataset is a corpus used for fine-grained Information Status (IS) classification. IS reflects the accessibility of a discourse entity based on the evolving discourse context and the speaker’s assumption about the hearer’s knowledge and beliefs. According to Markert et al. (2012), old mentions refer to entities that have been referred to previously; mediated mentions have not been mentioned before but are accessible to the hearer by reference to another old mention or to prior world knowledge; and new mentions refer to entities that are introduced to the discourse for the first time and are not known to the hearer before.

23 papers0 benchmarks

BEAT2 (BEAT-SMPLX-FLAME)

We propose EMAGE, a framework to generate full-body human gestures from audio and masked gestures, encompassing facial, local body, hands, and global movements. To achieve this, we first introduce BEAT2 (BEAT-SMPLX-FLAME), a new mesh-level holistic co-speech dataset. BEAT2 combines MoShed SMPLX body with FLAME head parameters and further refines the modeling of head, neck, and finger movements, offering a community-standardized, high-quality 3D motion captured dataset. EMAGE leverages masked body gesture priors during training to boost inference performance. It involves a Masked Audio Gesture Transformer, facilitating joint training on audio-to-gesture generation and masked gesture reconstruction to effectively encode audio and body gesture hints. Encoded body hints from masked gestures are then separately employed to generate facial and body movements. Moreover, EMAGE adaptively merges speech features from the audio's rhythm and content and utilizes four compositional VQ-VAEs to enh

23 papers9 benchmarks3d meshes, Audio, Texts, Time series

NExT-GQA

We study visually grounded VideoQA in response to the emerging trends of utilizing pretraining techniques for video-language understanding. Specifically, by forcing vision-language models (VLMs) to answer questions and simultaneously provide visual evidence, we seek to ascertain the extent to which the predictions of such techniques are genuinely anchored in relevant video content, versus spurious correlations from language or irrelevant visual context. Towards this, we construct NExT-GQA -- an extension of NExT-QA with 10.5K temporal grounding (or location) labels tied to the original QA pairs. With NExT-GQA, we scrutinize a variety of state-of-the-art VLMs. Through post-hoc attention analysis, we find that these models are weak in substantiating the answers despite their strong QA performance. This exposes a severe limitation of these models in making reliable predictions.

23 papers2 benchmarksTexts, Videos

NumGLUE

The NumGLUE dataset is a valuable resource developed by the Allen Institute for AI. It focuses on evaluating the performance of AI systems in mathematical reasoning tasks that involve numbers within natural language text. Here are the key details about NumGLUE:

23 papers0 benchmarks

CharXiv

CharXiv is a comprehensive evaluation suite for testing the chart understanding capabilities of Multimodal Large Language Models (MLLMs)¹². It was proposed to address the limitations of existing datasets that often focus on oversimplified and homogeneous charts with template-based questions¹².

23 papers0 benchmarks

FineDiving

We construct a fine-grained video dataset organized by both semantic and temporal structures, where each structure contains two-level annotations.

23 papers2 benchmarks

4D-DRESS (A 4D Dataset of Real-world Human Clothing with Semantic Annotations)

4D-DRESS is the first real-world 4D dataset of human clothing, capturing 64 human outfits in more than 520 motion sequences. These sequences include a) high-quality 4D textured scans; for each scan, we annotate b) vertex-level semantic labels, thereby obtaining c) the corresponding garment meshes and fitted SMPL(-X) body meshes. Totally, 4D-DRESS captures dynamic motions of 4 dresses, 28 lower, 30 upper, and 32 outer garments. For each garment, we also provide its canonical template mesh to benefit the future human clothing study.

23 papers11 benchmarks3D, 3d meshes, Videos

TimeQuestions

Question answering over knowledge graphs (KG-QA) is a vital topic in IR. Questions with temporal intent are a special class of practical importance, but have not received much attention in research. We present EXAQT, the first end-to-end system for answering complex temporal questions that have multiple entities and predicates, and associated temporal conditions.

23 papers1 benchmarks

Text8

Desc: About of Text8

22 papers2 benchmarksTexts

CUFSF (CUHK Face Sketch FERET Database)

The CUHK Face Sketch FERET (CUFSF) is a dataset for research on face sketch synthesis and face sketch recognition. It contains two types of face images: photo and sketch. Total 1,194 images (one image per subject) were collected with lighting variations from the FERET dataset. For each subject, a sketch is drawn with shape exaggeration.

22 papers24 benchmarksImages

iPinYou (iPinYou Global RTB Bidding Algorithm Competition Dataset)

The iPinYou Global RTB(Real-Time Bidding) Bidding Algorithm Competition is organized by iPinYou from April 1st, 2013 to December 31st, 2013.The competition has been divided into three seasons. For each season, a training dataset is released to the competition participants, the testing dataset is reserved by iPinYou. The complete testing dataset is randomly divided into two parts: one part is the leaderboard testing dataset to score and rank the participating teams on the leaderboard, and the other part is reserved for the final offline evaluation. The participant's last offline submission is evaluated by the reserved testing dataset to get a team's offline final score. This dataset contains all three seasons training datasets and leaderboard testing datasets.The reserved testing datasets are withheld by iPinYou. The training dataset includes a set of processed iPinYou DSP bidding, impression, click, and conversion logs.

22 papers2 benchmarks

Django

The Django dataset is a dataset for code generation comprising of 16000 training, 1000 development and 1805 test annotations. Each data point consists of a line of Python code together with a manually created natural language description.

22 papers2 benchmarksTexts

PTB Diagnostic ECG Database

The ECGs in this collection were obtained using a non-commercial, PTB prototype recorder with the following specifications:

22 papers10 benchmarksMedical

A3D (AnAn Accident Detection)

A new dataset of diverse traffic accidents.

22 papers1 benchmarksVideos

AMR Bank (Abstract Meaning Representation)

The AMR Bank is a set of English sentences paired with simple, readable semantic representations. Version 3.0 released in 2020 consists of 59,255 sentences.

22 papers0 benchmarksTexts

SKU110K

The Sku110k dataset provides 11,762 images with more than 1.7 million annotated bounding boxes captured in densely packed scenarios, including 8,233 images for training, 588 images for validation, and 2,941 images for testing. There are around 1,733,678 instances in total. The images are collected from thousands of supermarket stores and are of various scales, viewing angles, lighting conditions, and noise levels. All the images are resized into a resolution of one megapixel. Most of the instances in the dataset are tightly packed and typically of a certain orientation in the rage of [−15∘, 15∘].

22 papers0 benchmarksImages

LOCATA

The LOCATA dataset is a dataset for acoustic source localization. It consists of real-world ambisonic speech recordings with optically tracked azimuth-elevation labels.

22 papers0 benchmarksAudio, Images
PreviousPage 96 of 1000Next