TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

298 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

298 dataset results

MTHS

the MTHS dataset contains 30Hz PPG signals obtained from 62 patients, including 35 men and 27 women. The ground truth data includes heart rate and oxygen saturation levels sampled at 1Hz. The HR and SPo2 measurement is obtained using a pulse oximeter (M70). An iPhone 5s was used to obtain the ppg recordings at 30 fps.

2 papers6 benchmarksBiomedical, Time series

Hotel (Hospitality > Tourism > Hotel Demand/Sales)

The dataset contains the hotel demand and revenue of 8 major tourist destinations in the US (e.g., Los Angeles, Orlando ...). The dataset contains sales, daily occupancy, demand, and revenue of the upper-middle class hotels.

2 papers0 benchmarksTabular, Time series

Ultra-processed Food Dataset

The raw data are obtained from an industrial plant for ultra-processed food production. The sampling was carried out every 5 minutes while the total production cycle takes approximately 3 hours, from raw ingredients to final semi- finished products. The extracted data represent approximately 80 days of production. Variables 2 − 14 belonging to 4 specific phases of the process and influence the qualitative variable 17. Variables 15 and 16 are external variables not controlled by the process which affect the final product. It should also be noted that some variation may be due to changes in raw materials, in production flow (variable 1) or to possible reconfiguration between weeks. However while the magnitude of effects may change between weeks, the causal relationships are dictated by the plant and process dynamics and are consistent (at the best of potential un-cofounder and faults) throughout the production .

2 papers0 benchmarksGraphs, Time series

MIMIC PERform Testing Dataset

The MIMIC PERform Testing dataset contains the following physiological signals recorded from 200 critically-ill patients during routine clinical care:

2 papers10 benchmarksBiomedical, Medical, Time series

Norwegian Endurance Athlete ECG Database

Abstract The Norwegian Endurance Athlete ECG Database contains 12-lead ECG recordings from 28 elite athletes from various sports in Norway. All recordings are 10 seconds resting ECGs recorded with a General Electric (GE) MAC VUE 360 electrocardiograph. All ECGs are interpreted with both the GE Marquette SL12 algorithm (version 23 (v243)) and one cardiologist with training in interpretation of athlete's ECG. The data was collected at the University of Oslo in February and March 2020.

2 papers0 benchmarksBiomedical, Medical, Time series

Multidimensional Texture Perception (Multidimensional Textural Perception and Classification Through Whisker)

Texture-based studies and designs have been in focus recently. Whisker-based multidimensional surface texture data is missing in the literature. This data is critical for robotics and machine perception algorithms in the classification and regression of textural surfaces. We present a novel sensor design to acquire multidimensional texture information. The surface texture's roughness and hardness were measured experimentally using sweeping and dabbing. The data is made available to the research community for further advancing texture perception studies.

2 papers0 benchmarksTime series

Stanford ECoG library: ECoG to Finger Movements

Electrophysiological data from implanted electrodes in the human brain are rare, and therefore scientific access to it has remained somewhat exclusive. Here we present a freely-available curated library of implanted electrocorticographic (ECoG) data and analyses for 16 benchmark behavioral experiments, with 204 individual datasets from 34 patients made with the same amplifiers (at the same sampling rate and filter settings). In every case, electrode positions have been carefully registered to brain anatomy. A large set of fully-commented analysis scripts to interpret these data using modern techniques is embedded in the library alongside the data. All data, anatomic correlations, and analysis files (MATLAB code) are in a common, intuitive file structure at https://searchworks.stanford.edu/view/zk881ps0522. The library may be used as course material or serve as a starter package for researchers early in their career or for established groups, to modify the analyses and re-apply them in

2 papers1 benchmarksBiomedical, Time series

Berlin V2X

The Berlin V2X dataset offers high-resolution GPS-located wireless measurements across diverse urban environments in the city of Berlin for both cellular and sidelink radio access technologies, acquired with up to 4 cars over 3 days. The data enables thus a variety of different ML studies towards vehicle-to-anything (V2X) communication.

2 papers0 benchmarksTabular, Time series

CaFFe (CAlving Fronts and where to Find thEm)

The temporal variability in calving front positions of marine-terminating glaciers permits inference on the frontal ablation. Frontal ablation, the sum of the calving rate and the melt rate at the terminus, significantly contributes to the mass balance of glaciers. Therefore, the glacier area has been declared as an Essential Climate Variable product by the World Meteorological Organization. The presented dataset provides the necessary information for training deep learning techniques to automate the process of calving front delineation. The dataset includes Synthetic Aperture Radar (SAR) images of seven glaciers distributed around the globe. Five of them are located in Antarctica: Crane, Dinsmoore-Bombardier-Edgeworth, Mapple, Jorum and the Sjörgen-Inlet Glacier. The remaining glaciers are the Jakobshavn Isbrae Glacier in Greenland and the Columbia Glacier in Alaska. Several images were taken for each glacier, forming a time series. The time series lie in the time span between 1995 an

2 papers1 benchmarksEnvironment, Images, Time series

ATMs fault prediction

The collected dataset consists of multivariate time series (MTS) data belonging to several ATMs banking along with the annotations that the operators did when they performed a maintenance task on any of the machines.

2 papers0 benchmarksTabular, Time series

Large-scale Ridesharing DARP Instances Based on Real Travel Demand

This dataset presents a set of large-scale ridesharing Dial-a-Ride Problem (DARP) instances. The instances were created as a standardized set of ridesharing DARP problems for the purpose of benchmarking and comparing different solution methods.

2 papers0 benchmarksGraphs, Tables, Tabular, Time series

Consumer Spendings (Finance > US Economy > Consumer Spendings)

State-level data for the US economy through the lens of consumer spending (Credit/Debit Spending) . The dataset is enriched with state-level Economic Dynamics and Policy Responses. Specifically, we further enriched the data with the state-level policies as an indication of extreme events (e.g., the state’s business closure order).

2 papers2 benchmarksTime series

neuronIO (Single cortical neuron (L5PC) input output simulation at 1ms temporal resolution)

Single cortical neurons as deep artificial neural networks This dataset contains training and testing subsets of the input/output relationship of a single cortical layer 5 pyramidal cell (L5PC) neuron at 1ms single spike temporal resolution. The data is obtained via a simulation that contains all of the currently (2021) known and well modeled "messy biological details" that relate to the operation of single neurons in the brain.

2 papers0 benchmarksBiology, Time series

TVL Dataset (Touch-Vision-Language Dataset)

Touch is an important sensing modality for humans, but it has not yet been incorporated into a multimodal generative language model. This is partially due to the difficulty of obtaining natural language labels for tactile data and the complexity of aligning tactile readings with both visual observations and language descriptions. As a step towards bridging that gap, this work introduces a new dataset of 44K in-the-wild vision-touch pairs, with English language labels annotated by humans (10%) and textual pseudo-labels from GPT-4V (90%). We use this dataset to train a vision-language-aligned tactile encoder for open-vocabulary classification and a touch-vision-language (TVL) model for text generation using the trained encoder. Results suggest that by incorporating touch, the TVL model improves (+29% classification accuracy) touch-vision-language alignment over existing models trained on any pair of those modalities. Although only a small fraction of the dataset is human-labeled, the TVL

2 papers0 benchmarksImages, Texts, Time series, Videos

WetLinks (WetLinks: a Large-Scale Longitudinal Starlink Dataset with Contiguous Weather Data)

WetLinks: a Large-Scale Longitudinal Starlink Dataset with Contiguous Weather Data. This data set includes stationary measurements of Starlink setups recorded over several months at two sites in Central Europe. The measurements sites are in Osnabrück (GER) and Enschede (NL). The throughput measurements were conducted UDP based. The dataset also contains high quality weather data, collected directly on the measurement site. See the paper for details.

2 papers0 benchmarksTime series

edeniss2020 (EDEN ISS 2020 Telemetry Dataset)

Overview The edeniss2020 dataset is a time series dataset. It consists of equidistant sensor readings stemming from 97 sensors in the EDEN ISS research greenhouse.

2 papers0 benchmarksTime series

DTGB (Dynamic Text-attributed Graph Benchmark)

We introduce Dynamic Text-attributed Graph Benchmark (DTGB), a collection of large-scale, time-evolving graphs from diverse domains, with nodes and edges enriched by dynamically changing text attributes and categories. To facilitate the use of DTGB, we design standardized evaluation procedures based on four real-world use cases: future link prediction, destination node retrieval, edge classification, and textual relation generation. These tasks require models to understand both dynamic graph structures and natural language, highlighting the unique challenges posed by DyTAGs.

2 papers0 benchmarksGraphs, Texts, Time series

HCP Aging (Lifespan Human Connectome Project Aging)

Lifespan HCP Release 2.0 includes cross-sectional visit 1 (V1) preprocessed structural and functional imaging data, unprocessed V1 imaging data for all included modalities (structural, high-res hippocampal T2, resting state fMRI, task fMRI, diffusion, and ASL), and non-imaging demographic and behavioral assessment data from 725 HCP-Aging (HCP-A, ages 36-100+) healthy participants (22+ TB of data).

2 papers2 benchmarks3D, Images, Medical, Time series

UK Biobank Brain MRI (UK Biobank Data - Brain MRI)

UK Biobank participants have generously provided a very wide range of information about their health and well-being since recruitment began in 2006. This has been added to in the following ways: 

2 papers2 benchmarks3D, Images, Texts, Time series

BASEPROD (The Bardenas Semi-Desert Planetary Rover Dataset)

BASEPROD provides comprehensive rover sensor data collected over a 1.7 km traverse, accompanied by high-resolution 2D and 3D drone maps of the terrain. The dataset also includes laser-induced breakdown spectroscopy (LIBS) measurements from key sampling sites along the rover's path, as well as weather station data to contextualize environmental conditions.

2 papers0 benchmarks3D, Environment, Images, Point cloud, RGB-D, Stereo, Tabular, Time series
PreviousPage 7 of 15Next