298 machine learning datasets
298 dataset results
Context There's a story behind every dataset and here's your opportunity to share yours.
Nearly 10,000 km² of free high-resolution and paired multi-temporal low-resolution satellite imagery of unique locations which ensure stratified representation of all types of land-use across the world: from agriculture to ice caps, from forests to multiple urbanization densities.
ChangeSim is a dataset aimed at online scene change detection (SCD) and more. The data is collected in photo-realistic simulation environments with the presence of environmental non-targeted variations, such as air turbidity and light condition changes, as well as targeted object changes in industrial indoor environments. By collecting data in simulations, multi-modal sensor data and precise ground truth labels are obtainable such as the RGB image, depth image, semantic segmentation, change segmentation, camera poses, and 3D reconstructions. While the previous online SCD datasets evaluate models given well-aligned image pairs, ChangeSim also provides raw unpaired sequences that present an opportunity to develop an online SCD model in an end-to-end manner, considering both pairing and detection. Experiments show that even the latest pair-based SCD models suffer from the bottleneck of the pairing process, and it gets worse when the environment contains the non-targeted variations.
Engine degradation simulation was carried out using C-MAPSS. Four different were sets simulated under different combinations of operational conditions and fault modes. Records several sensor channels to characterize fault evolution. The data set was provided by the Prognostics CoE at NASA Ames.
SEN12MS-CR-TS is a multi-modal and multi-temporal data set for cloud removal. It contains time-series of paired and co-registered Sentinel-1 and cloudy as well as cloud-free Sentinel-2 data from European Space Agency's Copernicus mission. Each time series contains 30 cloudy and clear observations regularly sampled throughout the year 2018. Our multi-temporal data set is readily pre-processed and backward-compatible with SEN12MS-CR.
We introduce a new dataset, Watch and Learn Time-lapse (WALT), consisting of multiple (4K and 1080p) cameras capturing urban environments over a year.
Prediction of Finger Flexion IV Brain-Computer Interface Data Competition The goal of this dataset is to predict the flexion of individual fingers from signals recorded from the surface of the brain (electrocorticography (ECoG)). This data set contains brain signals from three subjects, as well as the time courses of the flexion of each of five fingers. The task in this competition is to use the provided flexion information in order to predict finger flexion for a provided test set. The performance of the classifier will be evaluated by calculating the average correlation coefficient r between actual and predicted finger flexion.
A multivariate spatio-temporal benchmark dataset for meteorological forecasting based on real-time observation data from ground weather stations.
This dataset contains simulations of a complex, large-scale chemical plant proposed by Downs and Vogel (1993). As described by Reinartz, Kulahci and Ravn (2021):
The odometry benchmark consists of 22 stereo sequences, saved in loss less png format: We provide 11 sequences (00-10) with ground truth trajectories for training and 11 sequences (11-21) without ground truth for evaluation. For this benchmark you may provide results using monocular or stereo visual odometry, laser-based SLAM or algorithms that combine visual and LIDAR information. The only restriction we impose is that your method is fully automatic (e.g., no manual loop-closure tagging is allowed) and that the same parameter set is used for all sequences. A development kit provides details about the data format. More details are available at: https://www.cvlibs.net/datasets/kitti/eval_odometry.php.
The generation of data-driven prognostics models requires the availability of datasets with run-to-failure trajectories. In order to contribute to the development of these methods, the dataset provides a new realistic dataset of run-to-failure trajectories for a small fleet of aircraft engines under realistic flight conditions. The damage propagation modelling used for the generation of this synthetic dataset builds on the modeling strategy from previous work . The dataset was generated with the Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) dynamical model. The data set is been provided by the Prognostics CoE at NASA Ames in collaboration with ETH Zurich and PARC.
The PRONOSTIA (also called FEMTO) bearing dataset consists of 17 accelerated run-to-failures on a small bearing test rig. Both acceleration and temperature data was collected for each experiment.
Data Description The training data contains twelve-lead ECGs. The validation and test data contains twelve-lead, six-lead, four-lead, three-lead, and two-lead ECGs:
AnoShift is a large-scale anomaly detection benchmark, which focuses on splitting the test data based on its temporal distance to the training set, introducing three testing splits: IID, NEAR, and FAR. This testing scenario proves to capture the in-time performance degradation of anomaly detection methods for classical to masked language models.
Smart meter roll-outs provide easy access to granular meter measurements, enabling advanced energy services, ranging from demand response measures, tailored energy feedback and smart home/building automation. To design such services, train and validate models, access to data that resembles what is expected of smart meters, collected in a real-world setting, is necessary. The REFIT electrical load measurements dataset described in this paper includes whole house aggregate loads and nine individual appliance measurements at 8-second intervals per house, collected continuously over a period of two years from 20 houses. During monitoring, the occupants were conducting their usual routines. At the time of publishing, the dataset has the largest number of houses monitored in the United Kingdom at less than 1-minute intervals over a period greater than one year. The dataset comprises 1,194,958,790 readings, that represent over 250,000 monitored appliance uses. The data is accessible in an eas
The exiD dataset introduces a groundbreaking collection of naturalistic road user trajectories at highway entries and exits in Germany, meticulously captured with drones to navigate past the limitations of conventional traffic data collection methods, such as occlusions. This approach not only allows for the precise extraction of each road user’s trajectory and type but also ensures very high positional accuracy, thanks to sophisticated computer vision algorithms. Its innovative data collection technique minimizes errors and maximizes the quality and reliability of the dataset, making it a valuable resource for advanced research and development in the field of automated driving technologies.
Unified Time Series Dataset (UTSD) includes 7 domains with up to 1 billion time points with hierarchical capacities to facilitate research of large models in the field of time series. It is meticulously assembled from a blend of publicly accessible online data repositories and empirical data derived from real-world machine operations. We analyze each dataset within the collection, examining the time series through the lenses of stationarity and forecastability to allows us to characterize the level of complexity inherent to each dataset.
Floods are among the most common and devastating natural hazards, imposing immense costs on our society and economy due to their disastrous consequences. Recent progress in weather prediction and spaceborne flood mapping demonstrated the feasibility of anticipating extreme events and reliably detecting their catas- trophic effects afterwards. However, these efforts are rarely linked to one another and there is a critical lack of datasets and benchmarks to enable the direct forecast- ing of flood extent. To resolve this issue, we curate a novel dataset enabling a timely prediction of flood extent. Furthermore, we provide a representative evaluation of state-of-the-art methods, structured into two benchmark tracks for forecasting flood inundation maps i) in general and ii) focused on coastal regions. Altogether, our dataset and benchmark provide a comprehensive platform for evaluating flood forecasts, enabling future solutions for this critical challenge. Data, code & models are shared a
The unique Spatial Dynamic Wind Power Forecasting dataset: SDWPF, which includes the spatial distribution of wind turbines, as well as the dynamic context factors. Whereas, most of the existing datasets have only a small number of wind turbines without knowing the locations and context information of wind turbines at a fine-grained time scale. By contrast, SDWPF provides the wind power data of 134 wind turbines from a wind farm over half a year with their relative positions and internal statuses.
IowaRain is a dataset of rainfall events for the state of Iowa (2016-2019) acquired from the National Weather Service Next Generation Weather Radar (NEXRAD) system and processed by a quantitative precipitation estimation system. The dataset presented in this study could be used for better disaster monitoring, response and recovery by paving the way for both predictive and prescriptive modeling