TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

298 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

298 dataset results

Wallhack1.8k

The Wallhack1.8k dataset comprises 1,806 CSI amplitude spectrograms (and raw WiFi packet time series) corresponding to three activity classes: "no presence," "walking," and "walking + arm-waving." WiFi packets were transmitted at a frequency of 100 Hz, and each spectrogram captures a temporal context of approximately 4 seconds (400 WiFi packets).

2 papers0 benchmarksTime series

Amazon MTPP (Marked Temporal Point Processes on Amazon data)

The dataset includes time-stamped user product reviews behavior from January, 2008 to October, 2018. Each user has a sequence of produce review events with each event containing the timestamp and category of the reviewed product, with each category corresponding to an event type.

2 papers4 benchmarksTabular, Time series

StackOverflow MTPP (Marked Temporal Point Processes on StackOverflow data)

The dataset has two years of user awards on a question-answering website: each user received a sequence of badges and there are 22 different kinds of badges in total.

2 papers4 benchmarksTabular, Time series

AgeGroup Transactions MTPP (Marked Temporal Point Processes on financial transactions data)

The dataset contains historical financial transactions, including time, category and cost fields. There are 50000 clients, 205 categories and 43.7M events. The original goal was to predict the age group of the client. In this variant of the dataset, the goal is to forecast multiple future events.

2 papers4 benchmarksTabular, Time series

Chest wall lung sound dataset

Annotated audio files (separate combined annotation file) of lung sounds as recorded from various vantage points of the chest wall. The annotation includes the sound type (Insipratory: I, Experiatory: E, Wheezes: W, Crackles: C , N:Normal), the diagnosis as decided by a specialist (Asthma, COPD, BRON, heart failure, lung fibrosis, etc.), and the location on the chest wall from which the recording was taken (Posterior: P Lower: L Left: L Right R, UPPER: U, ANTERIOR: A, MIDDLE: M). The audio file names are coded: 1. Filter type; B: BELL 20-200Hz, Diaphragm 100-500 Hz, Extended range 50-500 Hz. 2. Patient number: P1-P112.

2 papers1 benchmarksAudio, Medical, Time series

WiFall (Wireless Sensing Dataset for Fall Detection, Action Recognition and People ID Identification with ESP32-S3)

WiFall dataset contains data related to fall detection, action recognition and people id identification in a meeting room scenario. The dataset provides synchronised CSI, RSSI, and timestamp for each sample.

2 papers1 benchmarksTime series

Digital twin-supported deep learning for fault diagnosis

This is a dataset used to test deep learning-supported deep learning for fault diagnosis: - A digital twin model for a robot. - A synthetic data from the digital twin to train a deep learning-based fault diagnosis model. - A real dataset collected from the real robot to test the sim-to-real performance. Download the dataset from: https://nextcloud.centralesupelec.fr/s/7AR6aamBZNXcRM8/download

2 papers1 benchmarksTime series

WildPPG (WildPPG: A Real-World PPG Dataset of Long Continuous Recordings)

a dataset of multi-modal signals from wearable devices at four sites on the body. Each device continuously recorded synchronized signals from a 3-channel reflective photoplethysmogram (red, green, infrared PPG), 3-axis inertial sensor (accelerometer), temperature, and barometric altitude sensor. For reference, the sternum device continuously recorded a Lead-I electrocardiogram (ECG) from body-mounted gel electrodes to provide ground-truth heart rate (HR) estimates.

2 papers5 benchmarksBiomedical, Time series

SPHERE-calorie

The dataset contains both RGB and depth images, and the data from two accelerometers, together with ground truth calorie values from a calorimeter for calorie expenditure estimation in home environments.

1 papers0 benchmarksImages, RGB-D, Time series

VLUC (Video-Like Urban Computing)

VLUC (Video-Like Urban Computing) is a benchmark for video-like computing on citywide traffic density and crowd prediction. It consists of two new datasets BousaiTYO and BousaiOSA and existing datasets TaxiBJ, BikeNYC I-II, and TaxiNYC.

1 papers0 benchmarksTime series

Dataset for Mid-Price Forecasting of Limit Order Book Data

This is a benchmark dataset for mid-price forecasting of limit order book data. It is a dataset of high-frequency limit order markets for mid-price prediction. The authors extracted normalized data representations of time series data for five stocks from the NASDAQ Nordic stock market for a time period of ten consecutive days, leading to a dataset of ~4,000,000 time series samples in total. A day-based anchored cross-validation experimental protocol is also provided that can be used as a benchmark for comparing the performance of state-of-the-art methodologies.

1 papers0 benchmarksTime series

Milling Data Set (UC Berkeley Milling Data Set)

Experiments on a metal milling machine for different speeds, feeds, and depth of cut. Records the wear of the milling insert, VB. The data set was provided by the BEST lab at UC Berkeley.

1 papers0 benchmarksTime series

Boombox

Boombox is a multi-modal dataset for visual reconstruction from acoustic vibrations. Involves dropping objects into a box and capturing resulting images and vibrations. Used for training ML systems that predict images from vibration.

1 papers0 benchmarks3D, Audio, Images, RGB-D, Time series

The RBO Dataset of Articulated Objects and Interactions

The RBO dataset of articulated objects and interactions is a collection of 358 RGB-D video sequences (67:18 minutes) of humans manipulating 14 articulated objects under varying conditions (light, perspective, background, interaction). All sequences are annotated with ground truth of the poses of the rigid parts and the kinematic state of the articulated object (joint states) obtained with a motion capture system. We also provide complete kinematic models of these objects (kinematic structure and three-dimensional textured shape models). In 78 sequences the contact wrenches during the manipulation are also provided.

1 papers0 benchmarks3d meshes, Point cloud, RGB-D, Time series, Videos

Quo Vadis, Open Source? (Quo Vadis, Open Source? The Limits of Open Source Growth)

This is an complete set of the data we collected and analyzed in our study "Quo Vadis, Open Source? The Limits of Open Source Growth". Please see our GitHub repository for details and tool chain.

1 papers0 benchmarksTime series

TERRA-REF (TERRA-REF, An open reference data set from high resolution genomics, phenomics, and imaging sensors)

The ARPA-E funded TERRA-REF project is generating open-access reference datasets for the study of plant sensing, genomics, and phenomics. Sensor data were generated by a field scanner sensing platform that captures color, thermal, hyperspectral, and active flourescence imagery as well as three dimensional structure and associated environmental measurements. This dataset is provided alongside data collected using traditional field methods in order to support calibration and validation of algorithms used to extract plot level phenotypes from these datasets.

1 papers0 benchmarks3D, Biology, Environment, Hyperspectral images, Point cloud, Stereo, Tabular, Time series

Well-being Dataset (Cambridge Well-being Dataset for Psychological Distress Analysis)

The dataset is a private dataset collected for automatic analysis of psychological distress. It contains self-reported distress labels provided by human volunteers. The dataset consists of 30-min interview recordings of participants.

1 papers1 benchmarksAudio, Speech, Time series, Videos

Nelson-Plosser (Nelson-Plosser US Macroeconomic Time Series)

US Macroeconomic dataset containing 14 time series of monthly observations. They have various lengths but all end in 1988. The variables: consumer price index, industrial production, nominal GNP, velocity, employment, interest rate, nominal wages, GNP deflator, money stock, real GNP, stock prices (S&P500), GNP per capita, real wages, unemployment.

1 papers0 benchmarksTables, Time series

Energy Consumption Curves of 499 Customers from Spain

Predictions of energy consumption are crucial for energy retailers to minimize deviations from energy acquired in the day-ahead market and the actual consumption of their customers. The increasing spread of smartmeters means that retailers have access to hourly consumption values of all their contracted customers in realtime. Using machine learning algorithms, these hourly values can be used to calculate predictions for the future energy consumption of the customers. The present data set allows the training and validation of AI-based prediction models.

1 papers0 benchmarksTime series

Building air quality and pandemic risk simulation

The original paper contains a high-level explanation of the dataset characteristics, and potential use cases of the dataset. ArchABM can help to quantify the impact of some of these building- and company policy-related measures.

1 papers0 benchmarksGraphs, Time series
PreviousPage 8 of 15Next