298 machine learning datasets
298 dataset results
The Wallhack1.8k dataset comprises 1,806 CSI amplitude spectrograms (and raw WiFi packet time series) corresponding to three activity classes: "no presence," "walking," and "walking + arm-waving." WiFi packets were transmitted at a frequency of 100 Hz, and each spectrogram captures a temporal context of approximately 4 seconds (400 WiFi packets).
The dataset includes time-stamped user product reviews behavior from January, 2008 to October, 2018. Each user has a sequence of produce review events with each event containing the timestamp and category of the reviewed product, with each category corresponding to an event type.
The dataset has two years of user awards on a question-answering website: each user received a sequence of badges and there are 22 different kinds of badges in total.
The dataset contains historical financial transactions, including time, category and cost fields. There are 50000 clients, 205 categories and 43.7M events. The original goal was to predict the age group of the client. In this variant of the dataset, the goal is to forecast multiple future events.
Annotated audio files (separate combined annotation file) of lung sounds as recorded from various vantage points of the chest wall. The annotation includes the sound type (Insipratory: I, Experiatory: E, Wheezes: W, Crackles: C , N:Normal), the diagnosis as decided by a specialist (Asthma, COPD, BRON, heart failure, lung fibrosis, etc.), and the location on the chest wall from which the recording was taken (Posterior: P Lower: L Left: L Right R, UPPER: U, ANTERIOR: A, MIDDLE: M). The audio file names are coded: 1. Filter type; B: BELL 20-200Hz, Diaphragm 100-500 Hz, Extended range 50-500 Hz. 2. Patient number: P1-P112.
WiFall dataset contains data related to fall detection, action recognition and people id identification in a meeting room scenario. The dataset provides synchronised CSI, RSSI, and timestamp for each sample.
This is a dataset used to test deep learning-supported deep learning for fault diagnosis: - A digital twin model for a robot. - A synthetic data from the digital twin to train a deep learning-based fault diagnosis model. - A real dataset collected from the real robot to test the sim-to-real performance. Download the dataset from: https://nextcloud.centralesupelec.fr/s/7AR6aamBZNXcRM8/download
a dataset of multi-modal signals from wearable devices at four sites on the body. Each device continuously recorded synchronized signals from a 3-channel reflective photoplethysmogram (red, green, infrared PPG), 3-axis inertial sensor (accelerometer), temperature, and barometric altitude sensor. For reference, the sternum device continuously recorded a Lead-I electrocardiogram (ECG) from body-mounted gel electrodes to provide ground-truth heart rate (HR) estimates.
The dataset contains both RGB and depth images, and the data from two accelerometers, together with ground truth calorie values from a calorimeter for calorie expenditure estimation in home environments.
VLUC (Video-Like Urban Computing) is a benchmark for video-like computing on citywide traffic density and crowd prediction. It consists of two new datasets BousaiTYO and BousaiOSA and existing datasets TaxiBJ, BikeNYC I-II, and TaxiNYC.
This is a benchmark dataset for mid-price forecasting of limit order book data. It is a dataset of high-frequency limit order markets for mid-price prediction. The authors extracted normalized data representations of time series data for five stocks from the NASDAQ Nordic stock market for a time period of ten consecutive days, leading to a dataset of ~4,000,000 time series samples in total. A day-based anchored cross-validation experimental protocol is also provided that can be used as a benchmark for comparing the performance of state-of-the-art methodologies.
Experiments on a metal milling machine for different speeds, feeds, and depth of cut. Records the wear of the milling insert, VB. The data set was provided by the BEST lab at UC Berkeley.
Boombox is a multi-modal dataset for visual reconstruction from acoustic vibrations. Involves dropping objects into a box and capturing resulting images and vibrations. Used for training ML systems that predict images from vibration.
The RBO dataset of articulated objects and interactions is a collection of 358 RGB-D video sequences (67:18 minutes) of humans manipulating 14 articulated objects under varying conditions (light, perspective, background, interaction). All sequences are annotated with ground truth of the poses of the rigid parts and the kinematic state of the articulated object (joint states) obtained with a motion capture system. We also provide complete kinematic models of these objects (kinematic structure and three-dimensional textured shape models). In 78 sequences the contact wrenches during the manipulation are also provided.
This is an complete set of the data we collected and analyzed in our study "Quo Vadis, Open Source? The Limits of Open Source Growth". Please see our GitHub repository for details and tool chain.
The ARPA-E funded TERRA-REF project is generating open-access reference datasets for the study of plant sensing, genomics, and phenomics. Sensor data were generated by a field scanner sensing platform that captures color, thermal, hyperspectral, and active flourescence imagery as well as three dimensional structure and associated environmental measurements. This dataset is provided alongside data collected using traditional field methods in order to support calibration and validation of algorithms used to extract plot level phenotypes from these datasets.
The dataset is a private dataset collected for automatic analysis of psychological distress. It contains self-reported distress labels provided by human volunteers. The dataset consists of 30-min interview recordings of participants.
US Macroeconomic dataset containing 14 time series of monthly observations. They have various lengths but all end in 1988. The variables: consumer price index, industrial production, nominal GNP, velocity, employment, interest rate, nominal wages, GNP deflator, money stock, real GNP, stock prices (S&P500), GNP per capita, real wages, unemployment.
Predictions of energy consumption are crucial for energy retailers to minimize deviations from energy acquired in the day-ahead market and the actual consumption of their customers. The increasing spread of smartmeters means that retailers have access to hourly consumption values of all their contracted customers in realtime. Using machine learning algorithms, these hourly values can be used to calculate predictions for the future energy consumption of the customers. The present data set allows the training and validation of AI-based prediction models.
The original paper contains a high-level explanation of the dataset characteristics, and potential use cases of the dataset. ArchABM can help to quantify the impact of some of these building- and company policy-related measures.