298 machine learning datasets
298 dataset results
A novel audio-visual mouse saliency (AViMoS) dataset with the following key-features:
Source: Radar reflectivity data from the HydroMeteorological Service of Arpae (Emilia-Romagna, Italy). Geographical Coverage: Emilia-Romagna region, including flat Po Valley, the Apennines, and coastal areas. Time Period: 6 years (2015–2020). Data Resolution: Temporal: Scans taken every 5 minutes. Spatial: 1 km grid resolution. Area covered: 125 km radius per scan, covering a total of 71,172 square km. Reflectivity Range: 0 to 60 dBZ, clipped from an original range of -20 dBZ to 60 dBZ. Total Time Steps: 630,720 time steps in total. Precipitating Events: 179,264 time steps representing precipitating sequences. Non-precipitating Data: 71.5% of the data was discarded (non-precipitating). Dataset Split: -Training: 149,524 time steps. -Validation: 7,869 time steps. Test Sets: -Tokenizer Test Set (TTS): 21,871 radar images, focusing on extreme events. -Forecaster Test Set (FTS): 1,450 time steps from 10 selected extreme weather events (12 hours each). Data Augmentation: Random cropping, 90-
Dataset of cross-layer Radio Access Network (RAN) Key Performance Measurements (KPMs) and protocol stack logs collected on an Open RAN deployment instantiated on Colosseum with traffic twinned from that of commercial cellular traces. The dataset includes Base Station (BS)- and User Equipment (UE)-level KPMs from PHY, MAC, and App layers under different RAN configurations representative of AI/ML control policies, number of UEs, and traffic demand. The fine-grained metrics of the dataset make it possible to understand the connection between PHY and MAC KPMs measured at the BS and UEs, control policies, and end-to-end and App-layer KPMs that reflect user experience.
This dataset contains closed-loop position, velocity, and current trajectories from 83 motor drives, each consisting of a motor and a Harmonic Drive gearbox. It was created to aid in quantifying manufacturing variation at Mecademic Industrial Robotics. The dataset is suitable for system identification and state estimation tasks.
This page contains ARINC 429 message data recorded from the hardware-in-a-loop simulator. These messages were recorded using a SIGLENT SDS2204X Plus oscilloscope sampling at 20 MHz. The intent of this data is to enable cybersecurity research and development for ARINC 429 by providing detailed message data from multiple hardware sources.
The datasets used and analysed from the glucose clamp study are available in this Excel file. They include pseudonymised information on the participants, somatometric data, biomarkers of lipid metabolism and parameters of insulin-glucose homeostasis, i.e. concentrations of insulin, glucose and c-peptide as well as data from glucose-clamp experiments, HOMA, SPINA Carb parameters (SPINA-GBeta and SPINA-GR), Matsuda index, insulinogenic index, disposition index and McAuley index.
The datasets used and analysed from the glucose clamp study are available in this DIF file. They include pseudonymised information on the participants, somatometric data, biomarkers of lipid metabolism and parameters of insulin-glucose homeostasis, i.e. concentrations of insulin, glucose and c-peptide as well as data from glucose-clamp experiments, HOMA, SPINA Carb parameters (SPINA-GBeta and SPINA-GR), Matsuda index, insulinogenic index, disposition index and McAuley index.
This dataset is being used to evaluate PerfSim accuracy and speed against a real deployment in a Kubernetes cluster based on sfc-stress workloads.
WiFiCam dataset for through-wall imaging based on WiFi channel state information. The corresponding source code repository is located at: https://github.com/StrohmayerJ/wificam
The dataset consists of high-resolution three-dimensional (3D) turbulent flow simulations. It captures intricate vortex structures caused by a variety of shapes within a channel flow environment. The dataset is generated using OpenFOAM in large eddy simulation (LES) mode, ensuring the preservation of detailed turbulent characteristics across all spatial scales.
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
Temporal Dataset for Indoor and In-Vehicle Thermal Comfort Estimation Abstract Thermal comfort estimation is essential for enhancing user experience in static indoor environments and dynamic in-vehicle scenarios. While traditional datasets focus on buildings, their application to fast-changing conditions, such as in vehicles, remains unexplored. We address this gap by introducing two temporal datasets collected from (1) a self-built climatic chamber with 31 sensor signals and user-labeled ratings from 18 participants and (2) in-vehicle studies with 20 participants in a BMW 3 Series.
This dataset collection includes three files used for the experiments. Each file contains 6 columns: {timestep, vehicle ID, x coordinate in the map, y coordinate in the map, real bitrate, estimated bitrate}. The datasets, obtained from REMs with Gaussian estimation and real (https://ieee-dataport.org/open-access/crawdad-romataxi) or simulated (https://eclipse.dev/sumo/) vehicular mobility, are used in the original paper for optimizing the task of federated learning (client scheduling and resource allocation).
This dataset curates quantitative transparency disclosures about the online sexual exploitation of minors. In particular, it focuses on legally mandated reports to the national clearinghouse for the United States, the National Center for Missing and Exploited Children (NCMEC), and captures disclosures by electronic service providers as well as NCMEC.
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
Dataset Overview The dataset used for training and testing consists of vibration signals for six pump conditions:
Cleaned and preprocessed version of the Kyokushin Karate Motion Dataset by Szczkesna et al. The original dataset and detailed description can be found at here.
TSFM-ScalingLaws-Dataset
Wearanize+ includes overnight sleep data from 130 participants (one night each) using three different wearable devices: Zmax headband, Empatica E4 wristband, and ActivPAL leg patch, alongside full-scale PSG recorded with SomnoScreen Plus and Mentalab Explore Pro. It also includes questionnaires, such as PSQI, MADRE, and PHQ-9, providing information on participants’ sleep, dreams, and overall health. (The link to access the dataset will be added soon).
TRADES-LOB comprises simulated TRADES market data for Tesla and Intel, for 29/01 and 30/01. Specifically, the dataset is structured into four CSV files, each containing 50 columns. The initial six columns delineate the order features, followed by 40 columns that represent a snapshot of the LOB across the top 10 levels. The concluding four columns provide key financial metrics: mid-price, spread, order volume imbalance, and Volume-Weighted Average Price (VWAP), which can be useful for downstream financial tasks, such as stock price prediction. In total, the dataset is composed of 265,986 rows and 13,299,300 cells, which is similar in size to the benchmark FI-2010 dataset.