TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

JHU-CROWD++

JHU-CROWD++ is A large-scale unconstrained crowd counting dataset with 4,372 images and 1.51 million annotations. This dataset is collected under a variety of diverse scenarios and environmental conditions. In addition, the dataset provides comparatively richer set of annotations like dots, approximate bounding boxes, blur levels, etc.

48 papers3 benchmarksImages

DNS Challenge (Deep Noise Suppression Challenge)

The DNS Challenge at INTERSPEECH 2020 intended to promote collaborative research in single-channel Speech Enhancement aimed to maximize the perceptual quality and intelligibility of the enhanced speech. The challenge evaluated the speech quality using the online subjective evaluation framework ITU-T P.808. The challenge provides large datasets for training noise suppressors.

48 papers2 benchmarks

SRD (Shadow Removal Dataset)

SRD is a dataset for shadow removal that contains 3088 shadow and shadow-free image pairs.

48 papers12 benchmarksImages

VOICES (Voices Obscured In Complex Environmental Settings)

The VOICES corpus is a dataset to promote speech and signal processing research of speech recorded by far-field microphones in noisy room conditions.

48 papers0 benchmarksAudio, Speech

CUHK Avenue

Avenue Dataset contains 16 training and 21 testing video clips. The videos are captured in CUHK campus avenue with 30652 (15328 training, 15324 testing) frames in total.

48 papers10 benchmarksRGB Video

BA-2motifs

It's a synthetic dataset, which contains 1000 graphs divided into two classes according to the motif they contain: either a “house” or a five-node cycle.

48 papers1 benchmarks

Reddit TIFU

Reddit TIFU dataset is a newly collected Reddit dataset, where TIFU denotes the name of /r/tifu subbreddit. There are 122,933 text-summary pairs in total.

47 papers3 benchmarksTexts

MedleyDB

MedleyDB, is a dataset of annotated, royalty-free multitrack recordings. It was curated primarily to support research on melody extraction. For each song melody f₀ annotations are provided as well as instrument activations for evaluating automatic instrument recognition. The original dataset consists of 122 multitrack songs out of which 108 include melody annotations.

47 papers0 benchmarksAudio

QUASAR (QUestion Answering by Search And Reading)

The Question Answering by Search And Reading (QUASAR) is a large-scale dataset consisting of QUASAR-S and QUASAR-T. Each of these datasets is built to focus on evaluating systems devised to understand a natural language query, a large corpus of texts and to extract an answer to the question from the corpus. Specifically, QUASAR-S comprises 37,012 fill-in-the-gaps questions that are collected from the popular website Stack Overflow using entity tags. The QUASAR-T dataset contains 43,012 open-domain questions collected from various internet sources. The candidate documents for each question in this dataset are retrieved from an Apache Lucene based search engine built on top of the ClueWeb09 dataset.

47 papers0 benchmarksTexts

BillSum

BillSum is the first dataset for summarization of US Congressional and California state bills.

47 papers1 benchmarksTexts

ECHR

ECHR is an English legal judgment prediction dataset of cases from the European Court of Human Rights (ECHR). The dataset contains ~11.5k cases, including the raw text.

47 papers0 benchmarksTexts

CityFlow

CityFlow is a city-scale traffic camera dataset consisting of more than 3 hours of synchronized HD videos from 40 cameras across 10 intersections, with the longest distance between two simultaneous cameras being 2.5 km. The dataset contains more than 200K annotated bounding boxes covering a wide range of scenes, viewing angles, vehicle models, and urban traffic flow conditions.

47 papers2 benchmarksImages, Videos

WildDash

WildDash is a benchmark evaluation method is presented that uses the meta-information to calculate the robustness of a given algorithm with respect to the individual hazards.

47 papers4 benchmarks

UAV-Human

UAV-Human is a large dataset for human behavior understanding with UAVs. It contains 67,428 multi-modal video sequences and 119 subjects for action recognition, 22,476 frames for pose estimation, 41,290 frames and 1,144 identities for person re-identification, and 22,263 frames for attribute recognition. The dataset was collected by a flying UAV in multiple urban and rural districts in both daytime and nighttime over three months, hence covering extensive diversities w.r.t subjects, backgrounds, illuminations, weathers, occlusions, camera motions, and UAV flying attitudes. This dataset can be used for UAV-based human behavior understanding, including action recognition, pose estimation, re-identification, and attribute recognition.

47 papers38 benchmarksRGB Video, RGB-D

BDD-X (Berkeley Deep Drive-X (eXplanation))

Berkeley Deep Drive-X (eXplanation) is a dataset is composed of over 77 hours of driving within 6,970 videos. The videos are taken in diverse driving conditions, e.g. day/night, highway/city/countryside, summer/winter etc. On average 40 seconds long, each video contains around 3-4 actions, e.g. speeding up, slowing down, turning right etc., all of which are annotated with a description and an explanation. Our dataset contains over 26K activities in over 8.4M frames.

47 papers0 benchmarks

Gait3D

Gait3D is a large-scale 3D representation-based gait recognition dataset. It contains 4,000 subjects and over 25,000 sequences extracted from 39 cameras in an unconstrained indoor scene.

47 papers4 benchmarks3D, 3d meshes, Images

UBnormal (University of Bucharest Abnormal Videos)

UBnormal is a new supervised open-set benchmark composed of multiple virtual scenes for video anomaly detection. Unlike existing data sets, the data set introduces abnormal events annotated at the pixel level at training time, for the first time enabling the use of fully-supervised learning methods for abnormal event detection. To preserve the typical open-set formulation, the data set includes disjoint sets of anomaly types in the training and test collections of videos.

47 papers8 benchmarks

MultiCoNER

MultiCoNER is a large multilingual dataset (11 languages) for Named Entity Recognition. It is designed to represent some of the contemporary challenges in NER, including low-context scenarios (short and uncased text), syntactically complex entities such as movie titles, and long-tail entity distributions.

47 papers0 benchmarksTexts

IMPACT

The IMPACT dataset contains 50 human created prompts for each category, 200 in total, to test LLMs general writing ability. Instructed LLMs demonstrate promising ability in writing-based tasks, such as composing letters or ethical debates. This dataset consists prompts across 4 diverse usage scenarios: - Informative Writing: User queries such as self-help advice or explanations for various concept - Professional Writing: Format involves suggestions presentations or emails in a business setting - Argumentative Writing: Debate positions on ethical and societal question - Creative Writing: Diverse writing formats such as stories, poems, and songs.

47 papers0 benchmarks

Elephant

The Elephant MIL dataset is a benchmark used in multiple instance learning (MIL), which falls under the broader categories of image classification and content-based image retrieval. The task is to determine if an image contains an elephant. Each image is treated as a "bag," and within each bag, the image is segmented into various regions called "instances," represented by feature vectors that capture visual characteristics like color, texture, and shape. A bag is labeled as positive if at least one instance contains an elephant, and negative if none of the instances do. The dataset includes 200 images (bags) with a total of 1220 1220 segments (instances), averaging ~6.1 segments per image. The challenge is that only some segments in a positive image might actually show an elephant, so the goal is to correctly classify the entire image based on these segments. This dataset is widely used to evaluate MIL algorithms, especially in cases where only parts of the data might contain the relev

47 papers2 benchmarksTabular
PreviousPage 59 of 1000Next