Datasets

298 machine learning datasets

298 dataset results

AViMoS (Audio-Visual Mouse Saliency)

A novel audio-visual mouse saliency (AViMoS) dataset with the following key-features:

1 papers0 benchmarksAudio, Time series, Tracking, Videos

MIARAD Dataset

Source: Radar reflectivity data from the HydroMeteorological Service of Arpae (Emilia-Romagna, Italy). Geographical Coverage: Emilia-Romagna region, including flat Po Valley, the Apennines, and coastal areas. Time Period: 6 years (2015–2020). Data Resolution: Temporal: Scans taken every 5 minutes. Spatial: 1 km grid resolution. Area covered: 125 km radius per scan, covering a total of 71,172 square km. Reflectivity Range: 0 to 60 dBZ, clipped from an original range of -20 dBZ to 60 dBZ. Total Time Steps: 630,720 time steps in total. Precipitating Events: 179,264 time steps representing precipitating sequences. Non-precipitating Data: 71.5% of the data was discarded (non-precipitating). Dataset Split: -Training: 149,524 time steps. -Validation: 7,869 time steps. Test Sets: -Tokenizer Test Set (TTS): 21,871 radar images, focusing on extreme events. -Forecaster Test Set (FTS): 1,450 time steps from 10 selected extreme weather events (12 hours each). Data Augmentation: Random cropping, 90-

1 papers0 benchmarksTime series

Open RAN Commercial Traffic Twinning Dataset

Dataset of cross-layer Radio Access Network (RAN) Key Performance Measurements (KPMs) and protocol stack logs collected on an Open RAN deployment instantiated on Colosseum with traffic twinned from that of commercial cellular traces. The dataset includes Base Station (BS)- and User Equipment (UE)-level KPMs from PHY, MAC, and App layers under different RAN configurations representative of AI/ML control policies, number of UEs, and traffic demand. The fine-grained metrics of the dataset make it possible to understand the connection between PHY and MAC KPMs measured at the BS and UEs, control policies, and end-to-end and App-layer KPMs that reflect user experience.

1 papers0 benchmarksTime series

Quantifying Manufacturing Variation in Motor Drives

This dataset contains closed-loop position, velocity, and current trajectories from 83 motor drives, each consisting of a motor and a Harmonic Drive gearbox. It was created to aid in quantifying manufacturing variation at Mecademic Industrial Robotics. The dataset is suitable for system identification and state estimation tasks.

1 papers0 benchmarksTime series

ARINC 429 Voltage Data

This page contains ARINC 429 message data recorded from the hardware-in-a-loop simulator. These messages were recorded using a SIGLENT SDS2204X Plus oscilloscope sampling at 20 MHz. The intent of this data is to enable cybersecurity research and development for ARINC 429 by providing detailed message data from multiple hardware sources.

1 papers0 benchmarksTime series

41598_2022_22531_MOESM2_ESM.xlsx

The datasets used and analysed from the glucose clamp study are available in this Excel file. They include pseudonymised information on the participants, somatometric data, biomarkers of lipid metabolism and parameters of insulin-glucose homeostasis, i.e. concentrations of insulin, glucose and c-peptide as well as data from glucose-clamp experiments, HOMA, SPINA Carb parameters (SPINA-GBeta and SPINA-GR), Matsuda index, insulinogenic index, disposition index and McAuley index.

1 papers0 benchmarksBiomedical, Medical, Tabular, Time series

41598_2022_22531_MOESM1_ESM.dif

The datasets used and analysed from the glucose clamp study are available in this DIF file. They include pseudonymised information on the participants, somatometric data, biomarkers of lipid metabolism and parameters of insulin-glucose homeostasis, i.e. concentrations of insulin, glucose and c-peptide as well as data from glucose-clamp experiments, HOMA, SPINA Carb parameters (SPINA-GBeta and SPINA-GR), Matsuda index, insulinogenic index, disposition index and McAuley index.

1 papers0 benchmarksBiomedical, Medical, Tabular, Time series

Experiments data used for evaluating PerfSim simulation accuracy based on sfc-stress workloads (Michel Gokan Khan)

This dataset is being used to evaluate PerfSim accuracy and speed against a real deployment in a Kubernetes cluster based on sfc-stress workloads.

1 papers0 benchmarksTime series

WiFiCam

WiFiCam dataset for through-wall imaging based on WiFi channel state information. The corresponding source code repository is located at: https://github.com/StrohmayerJ/wificam

1 papers0 benchmarksEnvironment, Images, RGB Video, Time series

3D Flow Shapes

The dataset consists of high-resolution three-dimensional (3D) turbulent flow simulations. It captures intricate vortex structures caused by a variety of shapes within a channel flow environment. The dataset is generated using OpenFOAM in large eddy simulation (LES) mode, ensuring the preservation of detailed turbulent characteristics across all spatial scales.

1 papers0 benchmarksTime series, Videos

Digital Typhoon Dataset V2

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 papers0 benchmarksImages, Time series

AutoTherm

Temporal Dataset for Indoor and In-Vehicle Thermal Comfort Estimation Abstract Thermal comfort estimation is essential for enhancing user experience in static indoor environments and dynamic in-vehicle scenarios. While traditional datasets focus on buildings, their application to fast-changing conditions, such as in vehicles, remains unexplored. We address this gap by introducing two temporal datasets collected from (1) a self-built climatic chamber with 31 sensor signals and user-labeled ratings from 18 participants and (2) in-vehicle studies with 20 participants in a BMW 3 Series.

1 papers0 benchmarksAudio, EEG, Images, Time series, Tracking

VREM-FL datasets

This dataset collection includes three files used for the experiments. Each file contains 6 columns: {timestep, vehicle ID, x coordinate in the map, y coordinate in the map, real bitrate, estimated bitrate}. The datasets, obtained from REMs with Gaussian estimation and real (https://ieee-dataport.org/open-access/crawdad-romataxi) or simulated (https://eclipse.dev/sumo/) vehicular mobility, are used in the original paper for optimizing the task of federated learning (client scheduling and resource allocation).

1 papers0 benchmarksTables, Time series

Diaphanous: Transparency Disclosures About the Sexual Exploitation of Minors

This dataset curates quantitative transparency disclosures about the online sexual exploitation of minors. In particular, it focuses on legally mandated reports to the national clearinghouse for the United States, the National Center for Missing and Exploited Children (NCMEC), and captures disclosures by electronic service providers as well as NCMEC.

1 papers0 benchmarksTime series

VisQUIC

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 papers0 benchmarksRGB-D, Time series

Centrifugal Pump Fault Detection

Dataset Overview The dataset used for training and testing consists of vibration signals for six pump conditions:

1 papers0 benchmarksTime series

Kyokushin Karate Motion Dataset (Optical motion capture dataset of selected techniques in beginner and advanced Kyokushin karate athletes)

Cleaned and preprocessed version of the Kyokushin Karate Motion Dataset by Szczkesna et al. The original dataset and detailed description can be found at here.

1 papers0 benchmarksTime series

TSFM-ScalingLaws-Dataset

1 papers0 benchmarksTime series

Wearanize+ Dataset (v1.0)

Wearanize+ includes overnight sleep data from 130 participants (one night each) using three different wearable devices: Zmax headband, Empatica E4 wristband, and ActivPAL leg patch, alongside full-scale PSG recorded with SomnoScreen Plus and Mentalab Explore Pro. It also includes questionnaires, such as PSQI, MADRE, and PHQ-9, providing information on participants’ sleep, dreams, and overall health. (The link to access the dataset will be added soon).

1 papers0 benchmarksBiomedical, Tabular, Time series

TRADES-LOB

TRADES-LOB comprises simulated TRADES market data for Tesla and Intel, for 29/01 and 30/01. Specifically, the dataset is structured into four CSV files, each containing 50 columns. The initial six columns delineate the order features, followed by 40 columns that represent a snapshot of the LOB across the top 10 levels. The concluding four columns provide key financial metrics: mid-price, spread, order volume imbalance, and Volume-Weighted Average Price (VWAP), which can be useful for downstream financial tasks, such as stock price prediction. In total, the dataset is composed of 265,986 rows and 13,299,300 cells, which is similar in size to the benchmark FI-2010 dataset.

1 papers0 benchmarksFinancial, Time series

PreviousPage 13 of 15Next