TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

MultiScan

We introduce MultiScan, a scalable RGBD dataset construction pipeline leveraging commodity mobile devices to scan indoor scenes with articulated objects and web-based semantic annotation interfaces to efficiently annotate object and part semantics and part mobility parameters. We use this pipeline to collect 273 scans of 117 indoor scenes containing 10957 objects and 5129 parts. The resulting MultiScan dataset provides RGBD streams with per-frame camera poses, textured 3D surface meshes, richly annotated part-level and object-level semantic labels, and part mobility parameters. We validate our dataset on instance segmentation and part mobility estimation tasks and benchmark methods for these tasks from prior work. Our experiments show that part segmentation and mobility estimation in real 3D scenes remain challenging despite recent progress in 3D object segmentation.

4 papers12 benchmarksImages, Point cloud

LAM(line-level) (The LAM Dataset: A Novel Benchmark for Line-Level Handwritten Text Recognition)

Handwritten Text Recognition (HTR) is an open problem at the intersection of Computer Vision and Natural Language Processing. The main challenges, when dealing with historical manuscripts, are due to the preservation of the paper support, the variability of the handwriting – even of the same author over a wide time-span – and the scarcity of data from ancient, poorly represented languages. With the aim of fostering the research on this topic, in this paper we present the Ludovico Antonio Muratori (LAM) dataset, a large line-level HTR dataset of Italian ancient manuscripts edited by a single author over 60 years. The dataset comes in two configurations: a basic splitting and a date-based splitting which takes into account the age of the author. The first setting is intended to study HTR on ancient documents in Italian, while the second focuses on the ability of HTR systems to recognize text written by the same writer in time periods for which training data are not available. For both co

4 papers4 benchmarksImages, Texts

t4d (Thinking is for Doing)

This dataset was generated by the code implementation found here: https://github.com/sachith-gunasekara/t4d

4 papers0 benchmarks

FakeMusicCaps

The FakeMusicCaps dataset contains total of 27605 10 seconds music tracks corresponding to almost 77 hours, generated using 5 different Text-To-Music (TTM) models. It is designed to be used as a starting dataset for the training and/or evaluation of models for the detection and attribution of synthetic music generated via TTM models.

4 papers0 benchmarksAudio

ADE-OoD

ADE-OoD is a public benchmark for dense out-of-distribution detection in general natural images. It measures the ability to detect and localize objects which are out-of-distribution with respect to the 150 categories of the ADE20k semantic segmentation dataset.

4 papers2 benchmarksImages

RetVQA (Retrieval-Based Visual Question Answering)

The RetVQA dataset is a large-scale dataset designed for Retrieval-Based Visual Question Answering (RetVQA). RetVQA is a more challenging task than traditional VQA, as it requires models to retrieve relevant images from a pool of images before answering a question. The need for RetVQA stems from the fact that information needed to answer a question may be spread across multiple images.

4 papers2 benchmarksImages, Texts

DevAI

DEVAI is a benchmark of 55 realistic AI development tasks. It consists of plentiful manual annotations, including a total of 365 hierarchical user requirements. This dataset enables rich reinforcement signals for better automated AI software development.

4 papers0 benchmarks

MassSpecGym (MassSpecGym: A benchmark for the discovery and identification of molecules)

MassSpecGym provides three challenges for benchmarking the discovery and identification of new molecules from MS/MS spectra:

4 papers28 benchmarksBiology

MuseASTE (MuSe-CarASTE: A comprehensive dataset for aspect sentiment triplet extraction in automotive review videos)

•A new benchmark dataset for Aspect Sentiment Triplet Extraction. •First Aspect Sentiment Triplet Extraction (ASTE) Dataset in Automotive Domain. •Largest ASTE Dataset to date with annotations for over 28,295 sentences. •Dataset includes complex aspects not verbatim present in the sentence. •Domain: Aspect-based sentiment analysis, ASTE, Opinion Mining, Recommender System. •Four baseline SOTA models implemented on the dataset

4 papers1 benchmarksTexts

SCOUT: The Situated Corpus of Understanding Transaction

The Situated Corpus Of Understanding Transactions (SCOUT) is a multi-modal collection of human-robot dialogue in the task domain of collaborative exploration. The corpus was constructed from multi-phased Wizard-of-Oz experiments where human participants gave verbal instructions to a remotely-located robot to move and gather information about its surroundings. Each dialogue involved a human Commander, a Dialogue Manager (DM), and a Robot Navigator (RN), and took place in physical or simulated environments.

4 papers0 benchmarksDialog, Images, Interactive, LiDAR, Texts

QT-NSTDB (QT database + MIT-BIH Noise Stress Test Database (NSTDB))

We designed a baseline wander (BLW) removal benchmark to evaluate various methods using a consistent test set and uniform conditions. Details of the data preprocessing pipeline are heavily based on papers [1]. All 105 signals from the QT Database were resampled from 250 Hz to 360 Hz to align with the NSTDB sampling frequency. Heartbeats were extracted using the annotations provided by specialists. During this process, we identified a small number of incorrect annotations for beat start/end points, leading to cases where two consecutive beats were erroneously merged into one. To address this issue, we discarded beats exceeding 512 samples (1422.22 ms) in length. We designated heartbeats from 14 signals, accounting for 13% of the total signals, as the test set. These signals were selected to include two signals from each of the seven datasets comprising the QT Database, ensuring a diverse representation of pathologies in the test set. This setup provides a more robust evaluation of the g

4 papers8 benchmarksMedical

SDGym

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

4 papers0 benchmarks

FG-OVD (Fine-Grained Open-Vocabulary object Detection benchmarks)

Benchmark Suite Description for PapersWithCode Fine-Grained Open-Vocabulary Detection (FG-OVD) Benchmark Suite The FG-OVD benchmark suite evaluates the ability of open-vocabulary object detectors to discern fine-grained object properties such as color, material, pattern, and transparency. This suite introduces dynamic vocabularies for each object, consisting of one positive caption and several challenging negative captions, crafted using attribute substitution at varying difficulty levels.

4 papers0 benchmarks

https://github.com/ntu-aris/MMAUD (MMAUD Dataset)

In response to the evolving challenges posed by small unmanned aerial vehicles (UAVs), which possess the potential to transport harmful payloads or independently cause damage, we introduce MMAUD: a comprehensive Multi-Modal Anti-UAV Dataset. MMAUD addresses a critical gap in contemporary threat detection methodologies by focusing on drone detection, UAV-type classification, and trajectory estimation. MMAUD stands out by combining diverse sensory inputs, including stereo vision, various Lidars, Radars, and audio arrays. It offers a unique overhead aerial detection vital for addressing real-world scenarios with higher fidelity than datasets captured on specific vantage points using thermal and RGB. Additionally, MMAUD provides accurate Leica-generated ground truth data, enhancing credibility and enabling confident refinement of algorithms and models, which has never been seen in other datasets. Most existing works do not disclose their datasets, making MMAUD an invaluable resource for de

4 papers0 benchmarks

IBM Transactions for Anti Money Laundering

Money laundering is a multi-billion dollar issue. Detection of laundering is very difficult. Most automated algorithms have a high false positive rate: legitimate transactions incorrectly flagged as laundering. The converse is also a major problem -- false negatives, i.e. undetected laundering transactions. Naturally, criminals work hard to cover their tracks.

4 papers0 benchmarksGraphs, Time series

MixSet (Mixcase Dataset)

MIXSET comprises a total of 3.6k mixtext instances. The dataset features a blend of HWT(human-written text) and MGT(machine-generated text).

4 papers0 benchmarksTexts

I-SHEEP

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

4 papers0 benchmarks

TexBiG (Text-Bild-Gefüge)

TexBiG (from the German Text-Bild-Gefüge, meaning Text-Image-Structure) is a document layout analysis dataset for historical documents in the late 19th and early 20th century. The dataset provides instance segmentation (bounding boxes and polygons/masks) annotations for 19 different classes with more then 52.000 instances.

4 papers0 benchmarksImages

ProofNet#

ProofNet# is an evaluation benchmark derived from the original ProofNet, which contains 371 paired examples of informal undergraduate mathematical statements and their corresponding formalizations. Updated for Lean 4, ProofNet# corrects formalization errors and retains the original structure and content.

4 papers0 benchmarksTexts

DiaTrend

The DiaTrend dataset is composed of intensive longitudinal data from wearable medical devices, including a total of 27,561 days of continuous glucose monitor data and 8,220 days of insulin pump data from 54 patients with diabetes. This dataset is useful for developing novel analytic solutions that can reduce the disease burden for people living with diabetes and increase knowledge on chronic condition management in outpatient settings. The dataset is accessible at the following URL: Synapse

4 papers0 benchmarks
PreviousPage 256 of 1000Next