TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

SemEval-2017 Task-10

We describe the SemEval task of extracting keyphrases and relations between them from scientific documents, which is crucial for understanding which publications describe which processes, tasks and materials. Although this was a new task, we had a total of 26 submissions across 3 evaluation scenarios. We expect the task and the findings reported in this paper to be relevant for researchers working on understanding scientific content, as well as the broader knowledge base population and information extraction communities.

33 papers3 benchmarks

Wrench

Wrench is a benchmark platform for thorough and standardized evaluation of Weak Supervision (WS). It consists of 22 varied real-world datasets for classification and sequence tagging; a range of real, synthetic, and procedurally-generated weak supervision sources; and a modular, extensible framework for WS evaluation, including implementations for popular WS methods.

33 papers0 benchmarks

WMT 2020

WMT 2020 is a collection of datasets used in shared tasks of the Fifth Conference on Machine Translation. The conference builds on a series of annual workshops and conferences on Statistical Machine Translation.

33 papers0 benchmarksTexts

ClevrTex

ClevrTex is a new benchmark designed as the next challenge to compare, evaluate and analyze algorithms for unsupervised multi-object segmentation. ClevrTex features synthetic scenes with diverse shapes, textures and photo-mapped materials, created using physically based rendering techniques.

33 papers4 benchmarksImages

nuScenes-C

🤖 Robo3D - The nuScenes-C Benchmark nuScenes-C is an evaluation benchmark heading toward robust and reliable 3D perception in autonomous driving. With it, we probe the robustness of 3D detectors and segmentors under out-of-distribution (OoD) scenarios against corruptions that occur in the real-world environment. Specifically, we consider natural corruptions happen in the following cases:

33 papers9 benchmarksImages, Point cloud

AMIGOS (AMIGOS: A Dataset for Affect, Personality and Mood Research on Individuals and Groups)

We present a database for research on affect, personality traits and mood by means of neuro-physiological signals. Different to other databases, we elicited affect using both short and long videos in two configurations, one with individual viewers and one with groups of viewers. The database allows the multimodal study of the affective responses of individuals in relation to their personality and mood, and the analysis of how these responses are affected by (i) the individual/group configuration, and (ii) the duration of the videos (short vs long). The data is collected in two experimental settings. In the first one, 40 participants watched 16 short emotional videos while they were alone. In the second one, the same participants watched 4 long videos, some of them alone and the rest in groups. In both settings, the participants' signals, namely, Electroencephalogram (EEG), Electrocardiogram (ECG), and Galvanic Skin Response (GSR), were recorded using wearable sensors. Frontal, full-bod

33 papers4 benchmarksEEG

SciBench

SciBench is a large-scale scientific problem-solving benchmark suite that aims to systematically examine the reasoning capabilities required for complex scientific problem solving. SciBench contains two carefully curated datasets: an open set featuring a range of collegiate-level scientific problems drawn from mathematics, chemistry, and physics textbooks, and a closed set comprising problems from undergraduate-level exams in computer science and mathematics.

33 papers0 benchmarksTexts

MatrixCity

We build a large-scale, comprehensive, and high-quality synthetic dataset for city-scale neural rendering researches. Leveraging the Unreal Engine 5 City Sample project, we developed a pipeline to easily collect aerial and street city views with ground-truth camera poses, as well as a series of additional data modalities. Flexible control on environmental factors like light, weather, human and car crowd is also available in our pipeline, supporting the need of various tasks covering city-scale neural rendering and beyond. The resulting pilot dataset, MatrixCity, contains 67k aerial images and 452k street images from two city maps of total size 28km^2.

33 papers0 benchmarksImages, RGB-D

amazon-ratings

amazon-ratings is a product co-purchasing network based on data from SNAP datasets

33 papers1 benchmarksGraphs

Electricity (Individual household electric power consumption Data Set)

Abstract: Measurements of electric power consumption in one household with a one-minute sampling rate over a period of almost 4 years. Different electrical quantities and some sub-metering values are available.

32 papers12 benchmarksTime series

SHREC (SHape REtrieval Contest)

The SHREC dataset contains 14 dynamic gestures performed by 28 participants (all participants are right handed) and captured by the Intel RealSense short range depth camera. Each gesture is performed between 1 and 10 times by each participant in two way: using one finger and the whole hand. Therefore, the dataset is composed by 2800 sequences captured. The depth image, with a resolution of 640x480, and the coordinates of 22 joints (both in the 2D depth image space and in the 3D world space) are saved for each frame of each sequence in the dataset.

32 papers0 benchmarks

MQ2007

The MQ2007 dataset consists of queries, corresponding retrieved documents and labels provided by human experts. The possible relevance labels for each document are “relevant”, “partially relevant”, and “not relevant”.

32 papers0 benchmarksRanking

XSum

The Extreme Summarization (XSum) dataset is a dataset for evaluation of abstractive single-document summarization systems. The goal is to create a short, one-sentence new summary answering the question “What is the article about?”. The dataset consists of 226,711 news articles accompanied with a one-sentence summary. The articles are collected from BBC articles (2010 to 2017) and cover a wide variety of domains (e.g., News, Politics, Sports, Weather, Business, Technology, Science, Health, Family, Education, Entertainment and Arts). The official random split contains 204,045 (90%), 11,332 (5%) and 11,334 (5) documents in training, validation and test sets, respectively.

32 papers2 benchmarksTexts

UCF-CC-50

UCF-CC-50 is a dataset for crowd counting and consists of images of extremely dense crowds. It has 50 images with 63,974 head center annotations in total. The head counts range between 94 and 4,543 per image. The small dataset size and large variance make this a very challenging counting dataset.

32 papers0 benchmarksImages

ImageNet-P

ImageNet-P consists of noise, blur, weather, and digital distortions. The dataset has validation perturbations; has difficulty levels; has CIFAR-10, Tiny ImageNet, ImageNet 64 Ă— 64, standard, and Inception-sized editions; and has been designed for benchmarking not training networks. ImageNet-P departs from ImageNet-C by having perturbation sequences generated from each ImageNet validation image. Each sequence contains more than 30 frames, so to counteract an increase in dataset size and evaluation time only 10 common perturbations are used.

32 papers1 benchmarksTexts

AUTSL (Ankara University Turkish Sign Language Dataset)

The Ankara University Turkish Sign Language Dataset (AUTSL) is a large-scale, multimode dataset that contains isolated Turkish sign videos. It contains 226 signs that are performed by 43 different signers. There are 38,336 video samples in total. The samples are recorded using Microsoft Kinect v2 in RGB, depth and skeleton formats. The videos are provided at a resolution of 512Ă—512. The skeleton data contains spatial coordinates, i.e. (x, y), of the 25 junction points on the signer body that are aligned with 512Ă—512 data.

32 papers1 benchmarksVideos

Blended Skill Talk

To analyze how these capabilities would mesh together in a natural conversation, and compare the performance of different architectures and training schemes.

32 papers0 benchmarks

EmoBank

EmoBank is a corpus of 10k English sentences balancing multiple genres, annotated with dimensional emotion metadata in the Valence-Arousal-Dominance (VAD) representation format. EmoBank excels with a bi-perspectival and bi-representational design.

32 papers0 benchmarks

MedQuAD (Medical Question Answering Dataset)

MedQuAD includes 47,457 medical question-answer pairs created from 12 NIH websites (e.g. cancer.gov, niddk.nih.gov, GARD, MedlinePlus Health Topics). The collection covers 37 question types (e.g. Treatment, Diagnosis, Side Effects) associated with diseases, drugs and other medical entities such as tests.

32 papers0 benchmarksMedical, Texts

NWPU-Crowd

NWPU-Crowd consists of 5,109 images, in a total of 2,133,375 annotated heads with points and boxes. Compared with other real-world datasets, it contains various illumination scenes and has the largest density range (0~20,033).

32 papers2 benchmarks
PreviousPage 76 of 1000Next