Papers With Code 2 | ML Benchmarks, SotA Results & Code

This repository contains documentation for the dataset that accompanies our ICPE 2025 paper, "Shaved Ice: Optimal Compute Resource Commitments for Dynamic Multi-Cloud Workloads". It also includes example R and Python notebooks to read and visualize the data, including scripts to reproduce the figures and analysis results in the paper.

This project is archived on Zenodo, an open-access repository, to ensure long-term reproducibility of the research.

Dataset

The dataset contains normalized and obfuscated hourly data about VM demand in four example Snowflake deployments over a period of 3 years from 2/1/2021 to 1/30/2024. Each hour includes (type of VM, region, number of VMs of that type) used at that time. This dataset is available in both compressed CSV and Parquet formats.

Schema

Timestamp: An hourly timestamp for the record.
VM Type: This field is obfuscated with the precise VM identifier from the Cloud Service Provider mapped into a capital letter.
Region: The region where the VM was deployed. This field is obfuscated with the precise region name from the Cloud Service Provider mapped into a number between 1 and 4.
Count: The number of VMs of the specified type, region, and hour. This field is normalized such that the largest type, region, hour tuple is set to 1000 in each region and other values are scaled linearly to the nearest whole number.

Potential Use Cases

Provides realistic industry dataset for further research into cloud demand forecasting, commitment optimization, and capacity planning.

Shaved Ice Snowflake VM Demand Dataset

Dataset

Schema

Potential Use Cases