TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

MSQA

Multi-modal situated reasoning in 3D scenes

2 papers0 benchmarks

https://zenodo.org/records/15349731 (LoRaWAN Path Loss Measurements in an Indoor Office Setting including Environmental Factors/Conditions)

This dataset was collected during a LoRaWAN measurement campaign in a multi-room indoor office environment in the University of Siegen, Germany. It contains over 1.7 million time-stamped records from 6 LoRaWAN nodes transmitting once per minute to a single gateway. Each record includes environmental parameters: temperature, relative humidity, barometric pressure, particulate matter (PM2.5), and carbon dioxide (CO₂); as well as device metadata such as RSSI, SNR, spreading factor (SF), etc. The dataset also includes the effective signal power (ESP) and the noise (NP) for LoRaWAN propagation analysis purposes. The dataset is designed to support research on indoor wireless propagation, distance estimation, environment-aware modeling, among other IoT use cases and applications in line with the 6G flagship demands.

2 papers0 benchmarks

SVBench (Streaming Video Understanding Benchmark)

Dataset Card for SVBench This dataset card aims to provide a comprehensive overview of the SVBench dataset, including its purpose, structure, and sources. For details, see our Project, Paper and GitHub repository.

2 papers0 benchmarks

CUHK01 (CUHK Person Re-identification)

This dataset contains 971 identities from two disjoint camera views. Each identity has two samples per camera view. It is used for Person Re-identification.

1 papers0 benchmarksImages

STB (Stereo Hand Pose Benchmark)

3D hand pose data set created using stereo camera

1 papers7 benchmarks

Multi-Ego

A new multi-view egocentric dataset, Multi-Ego. The dataset is recorded simultaneously by three cameras, covering a wide variety of real-life scenarios. The footage is annotated by multiple individuals under various summarization configurations, with a consensus analysis ensuring a reliable ground truth.

1 papers0 benchmarksVideos

PhC-U373

Briefly describe the dataset. Provide:

1 papers0 benchmarks

Paper Field

Paper Field is built from the Microsoft Academic Graph and maps paper titles to one of 7 fields of study. Each field of study - geography, politics, economics, business, sociology, medicine, and psychology - has approximately 12K training examples.

1 papers3 benchmarksTexts

KT3DMoSeg

Please find more details of this dataset at https://alex-xun-xu.github.io/ProjectPage/CVPR_18/index.html

1 papers1 benchmarksVideos

Helsinki Prosody Corpus

The Helsinki Prosody Corpus is a dataset for predicting prosodic prominence from written text. The prosodic annotations are automatically generated, high quality prosodic for the 'clean' subsets of LibriTTS corpus (Zen et al., 2019), comprising of 262.5 hours of read speech from 1230 speakers. The transcribed sentences were aligned and then prosodically annotated with word-level acoustic prominence labels.

1 papers1 benchmarksTexts

HARD (Hotel Arabic-Reviews Dataset)

The Hotel Arabic-Reviews Dataset (HARD) contains 93700 hotel reviews in Arabic language. The hotel reviews were collected from Booking.com website during June/July 2016. The reviews are expressed in Modern Standard Arabic as well as dialectal Arabic.

1 papers1 benchmarks

ASLG-PC12 (English-ASL Gloss Parallel Corpus 2012)

An artificial corpus built using grammatical dependencies rules due to the lack of resources for Sign Language.

1 papers1 benchmarksTexts

IRMA (15,363 IRMA images of 193 categories for ImageCLEFmed 2009)

This collection compiles anonymous radiographs, which have been arbitrarly selected from routine at the Department of Diagnostic Radiology, Aachen University of Technology (RWTH), Aachen, Germany. The imagery represents different ages, genders, view positions and pathologies. Therefore, image quality varies significantly. All images were downscaled to fit into a 512 x 512 bounding box maintaining the original aspect ratio. All images were classified according to the IRMA code. Based on this code, 193 categories were defined. For 12,677 images, these categories are provided. The remaining 1,733 images without code are used as test data for the ImageCLEFmed 2009 competition.

1 papers0 benchmarksImages

CL-SciSumm

1 papers2 benchmarks

RuDaS (Synthetic Datasets for Rule Learning)

Logical rules are a popular knowledge representation language in many domains. Recently, neural networks have been proposed to support the complex rule induction process. However, we argue that existing datasets and evaluation approaches are lacking in various dimensions; for example, different kinds of rules or dependencies between rules are neglected. Moreover, for the development of neural approaches, we need large amounts of data to learn from and adequate, approximate evaluation measures. In this paper, we provide a tool for generating diverse datasets and for evaluating neural rule learning systems, including novel performance metrics.

1 papers2 benchmarks

DroneDeploy

From DroneDeploy:

1 papers4 benchmarksImages

WebEdit

Fact-based Text Editing dataset based on WebNLG dataset.

1 papers9 benchmarksTabular, Texts

RotoEdit

Fact-based Text Editing dataset based on RotoWire dataset

1 papers9 benchmarksTabular, Texts

Pan+ChiPhoto

Pan+ChiPhoto dataset is a Chinese character dataset. It is built by the combination of two datasets: ChiPhoto and Pan_Chinese_Character dataset. The images in this dataset are mainly captured at outdoors in Beijing and Shanghai, China, which involve various scenes like signs, boards, advertisements, banners, objects with texts printed on their surfaces.

1 papers0 benchmarksImages, Texts

Florentine

The Florentine dataset is a dataset of facial gestures which contains facial clips from 160 subjects (both male and female), where gestures were artificially generated according to a specific request, or genuinely given due to a shown stimulus. 1032 clips were captured for posed expressions and 1745 clips for induced facial expressions amounting to a total of 2777 video clips. Genuine facial expressions were induced in subjects using visual stimuli, i.e. videos selected randomly from a bank of Youtube videos to generate a specific emotion.

1 papers0 benchmarksVideos
PreviousPage 361 of 1000Next