TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

DAVIS-S

To enrich the diversity, we also collect 92 images which are suitable for saliency detection from DAVIS [27], a densely annotated high-resolution video segmentation dataset. Im- ages in this dataset are precisely annotated and have very high resolutions (i.e.,1920!1080). We ignore the categories of the objects and generate saliency ground truth masks for this dataset. For convenience, the collected dataset is denot- ed as DAVIS-S.

10 papers24 benchmarks

Code2Seq (Java)

Java-Small, Java-Med, Java-Large

10 papers0 benchmarks

EmoWOZ

EmoWOZ is the first large-scale open-source dataset for emotion recognition in task-oriented dialogues. It contains emotion annotations for user utterances in the entire MultiWOZ (10k+ human-human dialogues) and DialMAGE (1k human-machine dialogues collected from our human trial). Overall, there are 83k user utterances annotated. In addition, the emotion annotation scheme is tailored to task-oriented dialogues and considers the valence, the elicitor, and the conduct of the user emotion.

10 papers1 benchmarksTexts

Perception Test

Perception Test is a benchmark designed to evaluate the perception and reasoning skills of multimodal models. It introduces real-world videos designed to show perceptually interesting situations and defines multiple tasks that require understanding of memory, abstract patterns, physics, and semantics – across visual, audio, and text modalities. The benchmark consists of 11.6k videos, 23s average length, filmed by around 100 participants worldwide. The videos are densely annotated with six types of labels: object and point tracks, temporal action and sound segments, multiple-choice video question-answers and grounded video question-answers. The benchmark probes pre-trained models for their transfer capabilities, in a zero-shot / few-shot or fine tuning regime.

10 papers4 benchmarksVideos

UGIF

UGIF is a multi-lingual, multi-modal UI grounded dataset for step-by-step task completion on the smartphone. It contains 523 natural language instructions with paired sequences of multilingual UI screens and actions that show how to execute the task in eight languages.

10 papers0 benchmarksSpeech, Texts

BAF (Bank Account Fraud)

Bank Account Fraud (BAF) is a large-scale, realistic suite of tabular datasets. The suite was generated by applying state-of-the-art tabular data generation techniques on an anonymized, real-world bank account opening fraud detection dataset.

10 papers0 benchmarksTabular

CropAndWeed

The CropAndWeed dataset is focused on the fine-grained identification of 74 relevant crop and weed species with a strong emphasis on data variability. Annotations of labeled bounding boxes, semantic masks and stem positions are provided for about 112k instances in more than 8k high-resolution images of both real-world agricultural sites and specifically cultivated outdoor plots of rare weed types. Additionally, each sample is enriched with meta-annotations regarding environmental conditions.

10 papers0 benchmarksImages

SymphonyNet

First large-scale symphony generation dataset.

10 papers0 benchmarksMidi, Music

RealDOF (Single Image Defocus deblurring)

This dataset consists of 50 high resolution image pairs captured by dual-camera setup for single image defocus deblurring . Please note this is not a training set but a benchmark assessment.

10 papers0 benchmarks

OLIVES Dataset (Ophthalmic Labels for Investigating Visual Eye Semantics)

Clinical diagnosis of the eye is performed over multifarious data modalities including scalar clinical labels, vectorized biomarkers, two-dimensional fundus images, and three-dimensional Optical Coherence Tomography (OCT) scans. While the clinical labels, fundus images and OCT scans are instrumental measurements, the vectorized biomarkers are interpreted attributes from the other measurements. Clinical practitioners use all these data modalities for diagnosing and treating eye diseases like Diabetic Retinopathy (DR) or Diabetic Macular Edema (DME). Enabling usage of machine learning algorithms within the ophthalmic medical domain requires research into the relationships and interactions between these relevant data modalities. Existing datasets are limited in that: (i) they view the problem as disease prediction without assessing biomarkers, and (ii) they do not consider the explicit relationship among all four data modalities over the treatment period. In this paper, we introduce the O

10 papers0 benchmarksMedical

UEA time-series datasets (UEA time-series datasets for series-level anomaly detection)

Five datasets used in NeurTraL-AD paper: \textit{RacketSports (RS).} Accelerometer and gyroscope recording of players playing four different racket sports. Each sport is designated as a different class. \textit{Epilepsy (EPSY).} Accelerometer recording of healthy actors simulating four different activity classes, one of them being an epileptic shock. \textit{Naval air training and operating procedures standardization (NAT).} Positions of sensors mounted on different body parts of a person performing activities. There are six different activity classes in the dataset. \textit{Character trajectories (CT).} Velocity trajectories of a pen on a WACOM tablet. There are $20$ different characters in this dataset. \textit{Spoken Arabic Digits (SAD).} MFCC features of ten arabic digits spoken by $88$ different speakers.

10 papers1 benchmarksTime series

MGTAB (Multi-Relational Graph-Based Twitter Account Detection Benchmark)

MGTAB is the first standardized graph-based benchmark for stance and bot detection. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. For more details, please refer to the MGTAB paper.

10 papers4 benchmarks

Mocheg

A large-scale dataset that consists of 21,184 claims, where each claim is assigned a truthfulness label and ruling statement, with 58,523 pieces of evidence in the form of text and images. It supports the end-to-end multimodal fact-checking and explanation generation, where the input is a claim and a large collection of web sources, including articles, images, videos, and tweets, and the goal is to assess the truthfulness of the claim by retrieving relevant evidence and predicting a truthfulness label (i.e., support, refute and not enough information), and generate a rationalization statement to explain the reasoning and ruling process.

10 papers0 benchmarksImages, Texts

MP-DocVQA (Multipage Document Visual Question Answering)

The dataset is aimed to perform Visual Question Answering on multipage industry scanned documents. The questions and answers are reused from Single Page DocVQA (SP-DocVQA) dataset. The images also corresponds to the same in original dataset with previous and posterior pages with a limit of up to 20 pages per document.

10 papers0 benchmarksImages, Texts

Dynamic Replica

Dynamic Replica is a synthetic dataset of stereo videos featuring humans and animals in virtual environments. It is a benchmark for dynamic disparity/depth estimation and 3D reconstruction consisting of 145,200 stereo frames (524 videos).

10 papers0 benchmarksRGB-D, Videos

NLI4CT

NLI4CT dataset consists of 2,400 annotated statements with accompanying labels, CTRs, and evidence. Split into 1700 training, 500 test, and 200 development instances. The two labels and 4 CTR sections prompts are equally distributed across the dataset and its splits.

10 papers0 benchmarksTexts

DOTA 2.0 (Dataset of Object deTection in Aerial images)

—In the past decade, object detection has achieved significant progress in natural images but not in aerial images, due to the massive variations in the scale and orientation of objects caused by the bird’s-eye view of aerial images. More importantly, the lack of large-scale benchmarks has become a major obstacle to the development of object detection in aerial images (ODAI). In this paper, we present a large-scale Dataset of Object deTection in Aerial images (DOTA) and comprehensive baselines for ODAI. The proposed DOTA dataset contains 1,793,658 object instances of 18 categories of oriented-bounding-box annotations collected from 11,268 aerial images. Based on this large-scale and well-annotated dataset, we build baselines covering 10 state-of-the-art algorithms with over 70 configurations, where the speed and accuracy performances of each model have been evaluated. Furthermore, we provide a code library for ODAI and build a website for evaluating different algorithms. Previous chall

10 papers0 benchmarksImages

InDL (In-Diagram Logic)

Dataset Introduction

10 papers1 benchmarksImages

Synthetic Graph

We include five substructure counting tasks: 3-stars, triangles, tailed triangles, chordal cycles and attributed triangles. 3-star is a subgraph-counting task while the remaining are induced-subgraph-counting.

10 papers0 benchmarks

GUG (Grammatical” versus “UnGrammatical)

See article for detail

10 papers0 benchmarks
PreviousPage 156 of 1000Next