TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

298 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

298 dataset results

DurLAR (A High-Fidelity 128-Channel LiDAR Dataset with Panoramic Ambient and Reflectivity Imagery)

DurLAR is a high-fidelity 128-channel 3D LiDAR dataset with panoramic ambient (near infrared) and reflectivity imagery for multi-modal autonomous driving applications. Compared to existing autonomous driving task datasets, DurLAR has the following novel features:

5 papers0 benchmarks3D, Images, LiDAR, Point cloud, RGB Video, Stereo, Time series

Vi-Fi Multi-modal Dataset

A large-scale multi-modal dataset to facilitate research and studies that concentrate on vision-wireless systems. The Vi-Fi dataset is a large-scale multi-modal dataset that consists of vision, wireless and smartphone motion sensor data of multiple participants and passer-by pedestrians in both indoor and outdoor scenarios. In Vi-Fi, vision modality includes RGB-D video from a mounted camera. Wireless modality comprises smartphone data from participants including WiFi FTM and IMU measurements.

5 papers3 benchmarksRGB Video, RGB-D, Time series, Videos

Sales (Rossmann Store Sales)

Forecast Sales using ARIMA and SARIMA

5 papers0 benchmarksTime series

VNAT (VPN/NONVPN NETWORK APPLICATION TRAFFIC DATASET)

This dataset is a collection of labelled PCAP files, both encrypted and unencrypted, across 10 applications, as well as a pandas dataframe in HDF5 format containing detailed metadata summarizing the connections from those files. It was created to assist the development of machine learning tools that would allow operators to see the traffic categories of both encrypted and unencrypted traffic flows. In particular, features of the network packet traffic timing and size information (both inside of and outside of the VPN) can be leveraged to predict the application category that generated the traffic.

5 papers0 benchmarksTables, Time series

WEAR (WEAR: An Outdoor Sports Dataset for Wearable and Egocentric Activity Recognition)

WEAR is an outdoor sports dataset for both vision- and inertial-based human activity recognition (HAR). The dataset comprises data from 22 participants performing a total of 18 different workout activities with untrimmed inertial (acceleration) and camera (egocentric video) data recorded at 11 different outside locations. Unlike previous egocentric datasets, WEAR provides a challenging prediction scenario marked by purposely introduced activity variations as well as an overall small information overlap across modalities.

5 papers0 benchmarksTime series, Videos

voraus-AD

voraus-AD contains machine data of a collaborative robot, which moves a can by performing an industrial pick-and-place task. The samples consist of time series of machine data, each recorded over one pick-and-place operation. As usual in anomaly detection, the training set contains only normal data, which includes regular samples without anomalies. The test set contains both, normal data and anomalies, including 12 diverse anomaly types. In order to create a realistic scenario, we have divided the normal data into training and test data as follows: Up to a certain period of time, only training data including 948 samples was recorded. Subsequently, recordings of anomalies (755 samples) and normal data (419 samples) for the test set were taken alternately. This simulates a real application where training data would be recorded first in the same way to train the model before the test case occurs. To exclude temperature effects, we let robots warm up for half an hour before each recording.

5 papers1 benchmarksTime series

ClimateSet (ClimateSet - : A Large-Scale Climate Model Dataset for Machine Learning)

Climate models are critical tools for analyzing climate change and projecting its future impact. The machine learning (ML) community has taken an increased interest in supporting climate scientists’ efforts on various tasks such as climate model emulation, downscaling, and prediction tasks. However, traditional datasets based on single climate models are limiting. We thus present ClimateSet — a comprehensive collection of inputs and outputs from 36 climate models sourced from the Input4MIPs and CMIP6 archives, designed for large-scale ML applications.

5 papers0 benchmarks3D, Time series

WiGesture (Wireless Sensing Dataset for Gesture Recognition and People ID Identification with ESP32)

WiGesture dataset contains data related to gesture recognition and people id identification in a meeting room scenario. The dataset provides synchronised CSI, RSSI, and timestamp for each sample. It can be used for research on WiFi-based human gesture recognition and people id identification.

5 papers2 benchmarksTime series

PhyAAt (Physiology of Auditory Attention)

The dataset contains a collection of physiological signals (EEG, GSR, PPG) obtained from an experiment of the auditory attention on natural speech. Ethical Approval was acquired for the experiment. Details of the experiment can be found here https://phyaat.github.io/experiment

4 papers2 benchmarksEEG, Time series

PhysioNet Challenge 2020

Data The data for this Challenge are from multiple sources: CPSC Database and CPSC-Extra Database INCART Database PTB and PTB-XL Database The Georgia 12-lead ECG Challenge (G12EC) Database Undisclosed Database The first source is the public (CPSC Database) and unused data (CPSC-Extra Database) from the China Physiological Signal Challenge in 2018 (CPSC2018), held during the 7th International Conference on Biomedical Engineering and Biotechnology in Nanjing, China. The unused data from the CPSC2018 is NOT the test data from the CPSC2018. The test data of the CPSC2018 is included in the final private database that has been sequestered. This training set consists of two sets of 6,877 (male: 3,699; female: 3,178) and 3,453 (male: 1,843; female: 1,610) of 12-ECG recordings lasting from 6 seconds to 60 seconds. Each recording was sampled at 500 Hz.

4 papers35 benchmarksBiomedical, Time series

Lorenz Dataset

The Lorenz dataset contains 100000 time-series with length 24. The data has 5 modes and it is obtained using the Lorenz equation with 5 different seed values.

4 papers0 benchmarksTime series

eSports Sensors Dataset

The eSports Sensors dataset contains sensor data collected from 10 players in 22 matches in League of Legends. The sensor data collected includes:

4 papers6 benchmarks6D, Actions, Biomedical, EEG, Environment, Replay data, Tabular, Time series, Tracking

TCC

The largest and most realistic dataset available for TCC. It consists of 600 real-world videos recorded with a high-resolution mobile phone camera shooting 1824 x 1368 sized pictures. The length of these videos ranges from 3 to 17 frames (7.3 on average, the median is 7.0 and mode is 8.5). Ground truth information is present only for the last frame in each video (i.e., the shot frame), and was collected using a gray surface calibration target.

4 papers0 benchmarksTime series

Solar-Power (Solar Power Data for Integration Studies (Alabama))

Solar Power Data for Integration Studies NREL's Solar Power Data for Integration Studies are synthetic solar photovoltaic (PV) power plant data points for the United States representing the year 2006.

4 papers6 benchmarksTime series

eICU-CRD (eICU Collaborative Research Database)

The eICU Collaborative Research Database is a large multi-center critical care database made available by Philips Healthcare in partnership with the MIT Laboratory for Computational Physiology.

4 papers0 benchmarksMedical, Tables, Tabular, Time series

NILoc (Neural Inertial Localizatio)

IMU, WiFi data along with aligned Visual SLAM groundtruth locations from a smartphone carried during natural human motion

4 papers0 benchmarksTime series

VISUELLE2.0

Visuelle 2.0 is a dataset containing real data for 5355 clothing products of the retail fast-fashion Italian company, Nuna Lie. Specifically, Visuelle 2.0 provides data from 6 fashion seasons (partitioned in Autumn-Winter and Spring-Summer) from 2017-2019, right before the Covid-19 pandemic. Each product is accompanied by an HD image, textual tags and more. The time series data are disaggregated at the shop level, and include the sales, inventory stock, max-normalized prices (for the sake of confidentiality} and discounts. Exogenous time series data is also provided, in the form of Google Trends based on the textual tags and multivariate weather conditions of the stores’ locations. Finally, we also provide purchase data for 667K customers whose identity has been anonymized, to capture personal preferences. With these data, Visuelle 2.0 allows to cope with several problems which characterize the activity of a fast fashion company: new product demand forecasting, short-observation new pr

4 papers4 benchmarksImages, Texts, Time series

TimeHetNet (Meta Dataset for Time Series with heterogeneous networks)

This meta-dataset is composed of previously known datasets.

4 papers0 benchmarksTime series

Monash

Time Series Forecasting Repository containing datasets of related time series for global forecasting.

4 papers0 benchmarksTime series

PDEBench - Benchmark for Scientific Machine Learning

PDEBench provides a diverse and comprehensive set of benchmarks for scientific machine learning, including challenging and realistic physical problems. The repository consists of the code used to generate the datasets, to upload and download the datasets from the data repository, as well as to train and evaluate different machine learning models as baseline. PDEBench features a much wider range of PDEs than existing benchmarks and included realistic and difficult problems (both forward and inverse), larger ready-to-use datasets comprising various initial and boundary conditions, and PDE parameters. Moreover, PDEBench was crated to make the source code extensible and we invite active participation to improve and extent the benchmark.

4 papers0 benchmarksImages, Physics, Time series, Videos
PreviousPage 4 of 15Next