298 machine learning datasets
298 dataset results
HASCD (Human Activity Segmentation Challenge Dataset) contains 250 annotated multivariate time series capturing 10.7 h of real-world human motion smartphone sensor data from 15 bachelor computer science students. The recordings capture 6 distinct human motion sequences designed to represent pervasive behaviour in realistic indoor and outdoor settings. The data set serves as a benchmark for evaluating machine learning workflows.
The Mauna Loa Seeing Study was performed by the EOL/Integrated Surface Flux System team, capturing surface meteorology and flux products at the Mauna Loa Observatory in Hawaii.
The USNA long-term scintillation study is a continuing effort to characterize and measure optical turbulence in the near-maritime boundary layer.
The USNA long-term scintillation study is a continuing effort to characterize and measure optical turbulence in the near-maritime boundary layer.
The availability of well-curated datasets has driven the success of Machine Learning (ML) models. Despite greater access to earth observation data in agriculture, there is a scarcity of curated and labelled datasets, which limits the potential of its use in training ML models for remote sensing (RS) in agriculture. To this end, we introduce a first-of-its-kind dataset called SICKLE, which constitutes a time-series of multi-resolution imagery from 3 distinct satellites: Landsat-8, Sentinel-1 and Sentinel-2. Our dataset constitutes multi-spectral, thermal and microwave sensors during January 2018 - March 2021 period. We construct each temporal sequence by considering the cropping practices followed by farmers primarily engaged in paddy cultivation in the Cauvery Delta region of Tamil Nadu, India; and annotate the corresponding imagery with key cropping parameters at multiple resolutions (i.e. 3m, 10m and 30m). Our dataset comprises 2, 370 season-wise samples from 388 unique plots, having
This dataset is a BIDS-compatible version of the CHB-MIT Scalp EEG Database. It reorganizes the file structure to comply with the BIDS specification. To this effect:
This dataset is a BIDS compatible version of the Siena Scalp EEG Database. It reorganizes the file structure to comply with the BIDS specification. To this effect:
The database consists of EEG recordings of 14 patients acquired at the Unit of Neurology and Neurophysiology of the University of Siena. Subjects include 9 males (ages 25-71) and 5 females (ages 20-58). Subjects were monitored with a Video-EEG with a sampling rate of 512 Hz, with electrodes arranged on the basis of the international 10-20 System. Most of the recordings also contain 1 or 2 EKG signals. The diagnosis of epilepsy and the classification of seizures according to the criteria of the International League Against Epilepsy were performed by an expert clinician after a careful review of the clinical and electrophysiological data of each patient.
This dataset is obtained during an ICON project (2017-2018) in collaboration with KU Leuven (ESAT-STADIUS), UZ Leuven, UCB, Byteflies and Pilipili. The goal of this project was to design a system using Behind the ear (bhE) EEG electrodes for monitoring the patient in a home environment. This way, a nice balance can be found between sufficient accuracy of seizure detection algorithms (because EEG is used) and wearability (bhe EEG is relatively subtle, similar to a hear-aid device). The dataset acquired in the hospital during presurgical evaluation. During such presurgical evaluation, neurologists try to see if a specific part of the brain is causing the seizures, and if so, if that part of the brain can be removed during surgery. During the presurgical evaluation, patients are monitored using the vEEG for multiple days (typically a week). Patients are however restricted to move within their room because of the wiring and video analysis. In this dataset, following data is available per p
The uniD dataset is an innovative collection of naturalistic road user trajectories, captured within the RWTH Aachen University campus using drone technology to address common challenges such as occlusions found in traditional traffic data collection methods. It meticulously documents the movement and classifies each road user by type. Employing cutting-edge computer vision algorithms, the dataset ensures high positional accuracy. Its utility spans various applications, from predicting road user behavior and modeling driver actions to conducting scenario-based safety checks for automated driving systems and facilitating the data-driven design of Highly Automated Driving (HAD) system components.
The PRONTO heterogeneous benchmark dataset is based on an industrial-scale multiphase flow facility. It includes data from heterogeneous sources, including process measurements, alarm records, high frequency ultrasonic flow and pressure measurements, an operation log and video recordings. The study collected data from various operational conditions with and without induced faults to generate a multi-rate, multi-modal dataset. The dataset is suitable for developing and validating algorithms for fault detection and diagnosis (FDD) and data fusion.
Recorded with a Husky A200 wheeled UGV, BorealTC contains 116 min of Inertial Measurement Unit (IMU), motor current, and wheel odometry data, focusing on typical boreal forest terrains, notably snow, ice, and silty loam. The dataset also includes experiments on asphalt and flooring. All runs were recorded in Forêt Montmorency and on the main campus of Université Laval, Quebec City, Québec, Canada
Recorded with a Husky A200 wheeled UGV, the Vulpi 2021 dataset contains 13 min of Inertial Measurement Unit (IMU), motor current, and wheel odometry data, focusing on agricultural terrains. The dataset includes experiments on concrete, a dirt road, a ploughed terrain and an unploughed terrain that were all recorded on an experimental farm in San Cassiano, Lecce, Italy.
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
The sports industry is witnessing an increasing trend of utilizing multiple synchronized sensors for player data collection, enabling personalized training systems with multi-perspective real-time feedback. Badminton could benefit from these various sensors, but there is a scarcity of comprehensive badminton action datasets for analysis and training feedback. Addressing this gap, this paper introduces a multi-sensor badminton dataset for forehand clear and backhand drive strokes, based on interviews with coaches for optimal usability. The dataset covers various skill levels, including beginners, intermediates, and experts, providing resources for understanding biomechanics across skill levels. It encompasses 7,763 badminton swing data from 25 players, featuring sensor data on eye tracking, body tracking, muscle signals, and foot pressure. The dataset also includes video recordings, detailed annotations on stroke type, skill level, sound, ball landing, and hitting location, as well as s
An image sequence dataset of growing snowflakes in HDF5 format. Generated by the Gravner-Griffeath LCA model for snow crystal growth. Useful for modeling crystal growth with neural networks.
ESA Anomaly Dataset is the first large-scale, real-life satellite telemetry dataset with curated anomaly annotations originated from three ESA missions. We hope that this unique dataset will allow researchers and scientists from academia, research institutes, national and international space agencies, and industry to benchmark models and approaches on a common baseline as well as research and develop novel, computational-efficient approaches for anomaly detection in satellite telemetry data.
Release for uploading scripts and data to Zenodo
This dataset is made of 6366 threads collected from the r/AmITheAsshole community on Reddit. The dataset contains a total of 6,372,251 comments. The collected threads constitute the “top” submissions — those having the highest score, measured as the difference between upvotes and downvotes of a post. We downloaded them using PRAW, running 10 different queries across various temporal scopes, and then cleaning the obtained dataset by removing duplicated threads. Please refer to the paper, specifically to Table 3, for more details about the dataset.
The primary environmental health threat in the WHO European Region is air pollution, impacting the daily health and well-being of its citizens significantly. To effectively understand the impact, and dynamics of air quality a detailed investigation of different environmental, weather, and land cover indices is appropriate. To this end, this paper introduces three European cities’ spatiotemporal datasets, customized for air pollution monitoring at a regional level. The datasets are composed of major air quality, weather measurements and land use information. The duration is approximately from 2020 to 2023 with an hourly temporal resolution and a spatial resolution of 0.005°. The temporal and spatiotemporal datasets are publicly released aiming to provide a solid foundation for researchers, analysts, and practitioners to conduct in-depth analyses of air pollution dynamics.