TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

298 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

298 dataset results

HASCD (Human Activity Segmentation Challenge Dataset)

HASCD (Human Activity Segmentation Challenge Dataset) contains 250 annotated multivariate time series capturing 10.7 h of real-world human motion smartphone sensor data from 15 bachelor computer science students. The recordings capture 6 distinct human motion sequences designed to represent pervasive behaviour in realistic indoor and outdoor settings. The data set serves as a benchmark for evaluating machine learning workflows.

1 papers0 benchmarksTime series

MLO-Cn2 (Mauna Loa Seeing Study)

The Mauna Loa Seeing Study was performed by the EOL/Integrated Surface Flux System team, capturing surface meteorology and flux products at the Mauna Loa Observatory in Hawaii.

1 papers3 benchmarksTime series

USNA-Cn2 (long-term) (Unites States Naval Academy Long-term Scintillation Study)

The USNA long-term scintillation study is a continuing effort to characterize and measure optical turbulence in the near-maritime boundary layer.

1 papers2 benchmarksTime series

USNA-Cn2 (short-duration) (Unites States Naval Academy Short-duration Optical Turbulence Dataset)

The USNA long-term scintillation study is a continuing effort to characterize and measure optical turbulence in the near-maritime boundary layer.

1 papers3 benchmarksTime series

SICKLE (Satellite Imagery for Cropping annotated with Keyparameter LabEls)

The availability of well-curated datasets has driven the success of Machine Learning (ML) models. Despite greater access to earth observation data in agriculture, there is a scarcity of curated and labelled datasets, which limits the potential of its use in training ML models for remote sensing (RS) in agriculture. To this end, we introduce a first-of-its-kind dataset called SICKLE, which constitutes a time-series of multi-resolution imagery from 3 distinct satellites: Landsat-8, Sentinel-1 and Sentinel-2. Our dataset constitutes multi-spectral, thermal and microwave sensors during January 2018 - March 2021 period. We construct each temporal sequence by considering the cropping practices followed by farmers primarily engaged in paddy cultivation in the Cauvery Delta region of Tamil Nadu, India; and annotate the corresponding imagery with key cropping parameters at multiple resolutions (i.e. 3m, 10m and 30m). Our dataset comprises 2, 370 season-wise samples from 388 unique plots, having

1 papers1 benchmarksEnvironment, Images, Time series

BIDS CHB-MIT Scalp EEG Database

This dataset is a BIDS-compatible version of the CHB-MIT Scalp EEG Database. It reorganizes the file structure to comply with the BIDS specification. To this effect:

1 papers0 benchmarksEEG, Medical, Time series

BIDS Siena Scalp EEG Database

This dataset is a BIDS compatible version of the Siena Scalp EEG Database. It reorganizes the file structure to comply with the BIDS specification. To this effect:

1 papers0 benchmarksEEG, Medical, Time series

Siena Scalp EEG Database (Physionet Siena Scalp EEG Database)

The database consists of EEG recordings of 14 patients acquired at the Unit of Neurology and Neurophysiology of the University of Siena. Subjects include 9 males (ages 25-71) and 5 females (ages 20-58). Subjects were monitored with a Video-EEG with a sampling rate of 512 Hz, with electrodes arranged on the basis of the international 10-20 System. Most of the recordings also contain 1 or 2 EKG signals. The diagnosis of epilepsy and the classification of seizures according to the criteria of the International League Against Epilepsy were performed by an expert clinician after a careful review of the clinical and electrophysiological data of each patient.

1 papers0 benchmarksEEG, Medical, Time series

SeizeIT1

This dataset is obtained during an ICON project (2017-2018) in collaboration with KU Leuven (ESAT-STADIUS), UZ Leuven, UCB, Byteflies and Pilipili. The goal of this project was to design a system using Behind the ear (bhE) EEG electrodes for monitoring the patient in a home environment. This way, a nice balance can be found between sufficient accuracy of seizure detection algorithms (because EEG is used) and wearability (bhe EEG is relatively subtle, similar to a hear-aid device). The dataset acquired in the hospital during presurgical evaluation. During such presurgical evaluation, neurologists try to see if a specific part of the brain is causing the seizures, and if so, if that part of the brain can be removed during surgery. During the presurgical evaluation, patients are monitored using the vEEG for multiple days (typically a week). Patients are however restricted to move within their room because of the wiring and video analysis. In this dataset, following data is available per p

1 papers0 benchmarksEEG, Medical, Time series

uniD Dataset (University Drone Dataset)

The uniD dataset is an innovative collection of naturalistic road user trajectories, captured within the RWTH Aachen University campus using drone technology to address common challenges such as occlusions found in traditional traffic data collection methods. It meticulously documents the movement and classifies each road user by type. Employing cutting-edge computer vision algorithms, the dataset ensures high positional accuracy. Its utility spans various applications, from predicting road user behavior and modeling driver actions to conducting scenario-based safety checks for automated driving systems and facilitating the data-driven design of Highly Automated Driving (HAD) system components.

1 papers0 benchmarksTime series, Tracking

PRONTO (PRONTO heterogeneous benchmark dataset)

The PRONTO heterogeneous benchmark dataset is based on an industrial-scale multiphase flow facility. It includes data from heterogeneous sources, including process measurements, alarm records, high frequency ultrasonic flow and pressure measurements, an operation log and video recordings. The study collected data from various operational conditions with and without induced faults to generate a multi-rate, multi-modal dataset. The dataset is suitable for developing and validating algorithms for fault detection and diagnosis (FDD) and data fusion.

1 papers8 benchmarksTime series

BorealTC (Boreal Terrain Classification Dataset)

Recorded with a Husky A200 wheeled UGV, BorealTC contains 116 min of Inertial Measurement Unit (IMU), motor current, and wheel odometry data, focusing on typical boreal forest terrains, notably snow, ice, and silty loam. The dataset also includes experiments on asphalt and flooring. All runs were recorded in Forêt Montmorency and on the main campus of Université Laval, Quebec City, Québec, Canada

1 papers1 benchmarksTime series

Vulpi et al. 2021 (San Cassiano Terrain Classification Dataset)

Recorded with a Husky A200 wheeled UGV, the Vulpi 2021 dataset contains 13 min of Inertial Measurement Unit (IMU), motor current, and wheel odometry data, focusing on agricultural terrains. The dataset includes experiments on concrete, a dirt road, a ploughed terrain and an unploughed terrain that were all recorded on an experimental farm in San Cassiano, Lecce, Italy.

1 papers0 benchmarksTime series

Trust Dynamics and Market Behavior in Cryptocurrency (Trust Dynamics and Market Behavior in Cryptocurrency: A Comparative Study of Centralized and Decentralized Exchanges)

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 papers0 benchmarksTabular, Time series

MultiSenseBadminton (MultiSenseBadminton: Wearable Sensor–Based Biomechanical Dataset for Evaluation of Badminton Performance)

The sports industry is witnessing an increasing trend of utilizing multiple synchronized sensors for player data collection, enabling personalized training systems with multi-perspective real-time feedback. Badminton could benefit from these various sensors, but there is a scarcity of comprehensive badminton action datasets for analysis and training feedback. Addressing this gap, this paper introduces a multi-sensor badminton dataset for forehand clear and backhand drive strokes, based on interviews with coaches for optimal usability. The dataset covers various skill levels, including beginners, intermediates, and experts, providing resources for understanding biomechanics across skill levels. It encompasses 7,763 badminton swing data from 25 players, featuring sensor data on eye tracking, body tracking, muscle signals, and foot pressure. The dataset also includes video recordings, detailed annotations on stroke type, skill level, sound, ball landing, and hitting location, as well as s

1 papers0 benchmarksRGB Video, Time series

CGNE-Snowflakes

An image sequence dataset of growing snowflakes in HDF5 format. Generated by the Gravner-Griffeath LCA model for snow crystal growth. Useful for modeling crystal growth with neural networks.

1 papers0 benchmarksImages, Time series

ESA-AD (European Space Agency Dataset for Anomaly Detection in Satellite Telemetry)

ESA Anomaly Dataset is the first large-scale, real-life satellite telemetry dataset with curated anomaly annotations originated from three ESA missions. We hope that this unique dataset will allow researchers and scientists from academia, research institutes, national and international space agencies, and industry to benchmark models and approaches on a common baseline as well as research and develop novel, computational-efficient approaches for anomaly detection in satellite telemetry data.

1 papers0 benchmarksTime series

Deep Neural Network Training Script incl. Data for HCCI Low-Temperature Combustion

Release for uploading scripts and data to Zenodo

1 papers0 benchmarksTime series

r/AmITheAsshole Reddit threads

This dataset is made of 6366 threads collected from the r/AmITheAsshole community on Reddit. The dataset contains a total of 6,372,251 comments. The collected threads constitute the “top” submissions — those having the highest score, measured as the difference between upvotes and downvotes of a post. We downloaded them using PRAW, running 10 different queries across various temporal scopes, and then cleaning the obtained dataset by removing duplicated threads. Please refer to the paper, specifically to Table 3, for more details about the dataset.

1 papers0 benchmarksTexts, Time series

Regional AQ Datasets

The primary environmental health threat in the WHO European Region is air pollution, impacting the daily health and well-being of its citizens significantly. To effectively understand the impact, and dynamics of air quality a detailed investigation of different environmental, weather, and land cover indices is appropriate. To this end, this paper introduces three European cities’ spatiotemporal datasets, customized for air pollution monitoring at a regional level. The datasets are composed of major air quality, weather measurements and land use information. The duration is approximately from 2020 to 2023 with an hourly temporal resolution and a spatial resolution of 0.005°. The temporal and spatiotemporal datasets are publicly released aiming to provide a solid foundation for researchers, analysts, and practitioners to conduct in-depth analyses of air pollution dynamics.

1 papers0 benchmarksTime series
PreviousPage 12 of 15Next