TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

CLIMATE-FEVER

A new publicly available dataset for verification of climate change-related claims.

38 papers1 benchmarks

EMOTIC (EMOTIons in Context)

The EMOTIC dataset, named after EMOTions In Context, is a database of images with people in real environments, annotated with their apparent emotions. The images are annotated with an extended list of 26 emotion categories combined with the three common continuous dimensions Valence, Arousal and Dominance.

38 papers2 benchmarksImages

FewRel 2.0

A more challenging task to investigate two aspects of few-shot relation classification models: (1) Can they adapt to a new domain with only a handful of instances? (2) Can they detect none-of-the-above (NOTA) relations?

38 papers0 benchmarks

HDD (Honda Research Institute Driving Dataset)

Honda Research Institute Driving Dataset (HDD) is a dataset to enable research on learning driver behavior in real-life environments. The dataset includes 104 hours of real human driving in the San Francisco Bay Area collected using an instrumented vehicle equipped with different sensors.

38 papers0 benchmarks

InsuranceQA

InsuranceQA is a question answering dataset for the insurance domain, the data stemming from the website Insurance Library. There are 12,889 questions and 21,325 answers in the training set. There are 2,000 questions and 3,354 answers in the validation set. There are 2,000 questions and 3,308 answers in the test set.

38 papers0 benchmarksTexts

CUAD (Contract Understanding Atticus Dataset)

Contract Understanding Atticus Dataset (CUAD) is a dataset for legal contract review. CUAD was created with dozens of legal experts from The Atticus Project and consists of over 13,000 annotations. The task is to highlight salient portions of a contract that are important for a human to review.

38 papers0 benchmarksTexts

AISHELL-3

AISHELL-3 is a large-scale and high-fidelity multi-speaker Mandarin speech corpus which could be used to train multi-speaker Text-to-Speech (TTS) systems. The corpus contains roughly 85 hours of emotion-neutral recordings spoken by 218 native Chinese mandarin speakers and total 88035 utterances. Their auxiliary attributes such as gender, age group and native accents are explicitly marked and provided in the corpus. Accordingly, transcripts in Chinese character-level and pinyin-level are provided along with the recordings. The word & tone transcription accuracy rate is above 98%, through professional speech annotation and strict quality inspection for tone and prosody.

38 papers0 benchmarksSpeech, Texts

CVEfixes

CVEfixes is a comprehensive vulnerability dataset that is automatically collected and curated from Common Vulnerabilities and Exposures (CVE) records in the public U.S. National Vulnerability Database (NVD). The goal is to support data-driven security research based on source code and source code metrics related to fixes for CVEs in the NVD by providing detailed information at different interlinked levels of abstraction, such as the commit-, file-, and method level, as well as the repository- and CVE level.

38 papers0 benchmarks

ManiSkill

ManiSkill is a large-scale learning-from-demonstrations benchmark for articulated object manipulation with visual input (point cloud and image). ManiSkill supports object-level variations by utilizing a rich and diverse set of articulated objects, and each task is carefully designed for learning manipulations on a single category of objects. ManiSkill is equipped with high-quality demonstrations to facilitate learning-from-demonstrations approaches and perform evaluations on common baseline algorithms. ManiSkill can encourage the robot learning community to explore more on learning generalizable object manipulation skills.

38 papers0 benchmarksEnvironment

WaveFake

WaveFake is a dataset for audio deepfake detection. The dataset consists of a large-scale dataset of over 100K generated audio clips.

38 papers0 benchmarksAudio

WORD (Whole abdominal Organs Dataset)

WORD is a dataset for organ semantic segmentation that contains 150 abdominal CT volumes (30,495 slices) and each volume has 16 organs with fine pixel-level annotations and scribble-based sparse annotation, which may be the largest dataset with whole abdominal organs annotation.

38 papers0 benchmarksImages, Medical

OpenML-CC18

We advocate the use of curated, comprehensive benchmark suites of machine learning datasets, backed by standardized OpenML-based interfaces and complementary software toolkits written in Python, Java and R. We demonstrate how to easily execute comprehensive benchmarking studies using standardized OpenML-based benchmarking suites and complementary software toolkits written in Python, Java and R. Major distinguishing features of OpenML benchmark suites are (i) ease of use through standardized data formats, APIs, and existing client libraries; (ii) machine-readable meta-information regarding the contents of the suite; and (iii) online sharing of results, enabling large scale comparisons. As a first such suite, we propose the OpenML-CC18, a machine learning benchmark suite of 72 classification datasets carefully curated from the thousands of datasets on OpenML.

38 papers0 benchmarks

ConvFinQA (Conversational Finance Question Answering)

ConvFinQA is a dataset designed to study the chain of numerical reasoning in conversational question answering. The dataset contains 3892 conversations containing 14115 questions where 2715 of the conversations are simple conversations, and the rest 1,177 are hybrid conversations.

38 papers4 benchmarksTexts

DAIR-V2X

DAIR-V2X is a large-scale, multi-modality, multi-view dataset from real scenarios for VICAD. DAIR-V2X comprises 71254 LiDAR frames and 71254 Camera frames, and all frames are captured from real scenes with 3D annotations.

38 papers6 benchmarks

LTCC

LTCC contains 17,119 person images of 152 identities, and each identity is captured by at least two cameras. The dataset can be divided into two subsets: one cloth-change set where 91 persons appear with 416 different sets of outfits in 14,783 images, and one cloth-consistent subset containing the remaining 61 identities with 2,336 images without outfit changes. On average, there are 5 different clothes for each cloth-changing person, with the number of outfit changes ranging from 2 to 14.

38 papers4 benchmarks

tolokers

Tolokers is a crowdsourcing platform workers network based on data provided by Toloka.

38 papers1 benchmarksGraphs

RGBT234

The RGBT234 dataset is a comprehensive video dataset specifically designed for RGB-T (Red-Green-Blue and Thermal) tracking purposes. This dataset addresses the limitations of existing datasets like OSU-CT, LITIV, and GTOT in terms of size. RGBT234 consists of 234 RGB-T videos, each containing both an RGB video and a thermal video. The total number of frames in the dataset is approximately 234,000, with the largest video pair containing up to 8,000 frames.Each frame in the RGBT234 dataset is annotated with a minimum bounding box that covers the target for both the RGB and thermal modalities. The dataset also includes various environmental challenges such as rainy conditions, nighttime scenes, cold and hot weather scenarios. To analyze the performance of different tracking algorithms based on specific attributes, the RGBT234 dataset annotates 12 attributes and provides baseline trackers, including both deep learning and non-deep learning methods like structured SVM, sparse representation

38 papers2 benchmarksTracking, Videos

BigCodeBench

BigCodeBench is an easy-to-use benchmark for code generation with practical and challenging programming tasks¹. It aims to evaluate the true programming capabilities of large language models (LLMs) in a more realistic setting¹. The benchmark is designed for HumanEval-like function-level code generation tasks, but with much more complex instructions and diverse function calls¹.

38 papers0 benchmarks

GSM-Hard

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

38 papers0 benchmarks

Watercolor2k

Watercolor2k is a dataset used for cross-domain object detection which contains 2k watercolor images with image and instance-level annotations.

37 papers10 benchmarks
PreviousPage 68 of 1000Next