298 machine learning datasets
298 dataset results
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
The SNS data (Valente et al., 2013) is a four-wave survey conducted in Los Angeles county, the United States, which features a sample of 1,795 high-school students. The survey collected information about high-school students between grades 10 to 12, a majority of them self-identified as Hispanic. Among the collected information we have socio-economic status, demographics, social networks, and consumption of alcohol, tobacco, and marijuana–substance use.
Outliers or anomalies are instances that do not conform to the norm of a dataset. Outlier detection is an important data mining problem that has been researched within diverse research areas and applications domains such as intrusion detection, fraud detection, unusual event detection, disease condition detection etc.
FedTADBench is a federated time series anomaly detection benchmark. It covers 5 time series anomaly detection algorithms, 4 federated learning frameworks, and 3 time series anomaly detection datasets.
Dataset included measuring static tension under 2 kg load in different points of the CB and measurements in dynamic conditions. The latter conditions presumed the range of the linear belt speeds between nu_1 = 0.5 and nu_max = 1.7 m/s. 400 Hz unified sampling frequency for the experiments. It corresponded with 140 samples.
This dataset provides wireless measurements from two industrial testbeds: iV2V (industrial Vehicle-to-Vehicle) and iV2I+ (industrial Vehicular-to-Infrastructure plus sensor).
Dataset for User Verification part of MotionID: Human Authentication Approach. Data type: bin (should be converted by attached notebook). ~50 hours of IMU (Inertial Measurement Units) data for one specific motion pattern, provided by 101 users.
Dataset (part 1/3) for Motion Patterns Identification part of MotionID: Human Authentication Approach. Data type: bin (should be converted by attached notebook).
Dataset (part 2/3) for Motion Patterns Identification part of MotionID: Human Authentication Approach. Data type: bin (should be converted by attached notebook).
Dataset (part 3/3) for Motion Patterns Identification part of MotionID: Human Authentication Approach. Data type: bin (should be converted by attached notebook).
This repository contains a dataset and machine learning algorithms to detect poisoned water from clean water via using equivalent Smartphone embedded Wi-Fi CSI data.
The field of biomechanics is at a turning point, with marker-based motion capture set to be replaced by portable and inexpensive hardware, rapidly improving markerless tracking algorithms, and open datasets that will turn these new technologies into field-wide team projects. To expedite progress in this direction, we have collected the CMU Panoptic Dataset 2.0, which contains 86 subjects captured with 140 VGA cameras, 31 HD cameras, and 15 IMUs, performing on average 6.5 min of activities, including range of motion activities and tasks of daily living.
In this dataset UR5 robot used 6 tools: metal-scissor, metal-whisk, plastic-knife, plastic-spoon, wooden-chopstick, and wooden-fork to perform 6 behaviors: look, stirring-slow, stirring-fast, stirring-twist, whisk, and poke. The robot explored 15 objects: cane-sugar, chia-seed, chickpea, detergent, empty, glass-bead, kidney-bean, metal-nut-bolt, plastic-bead, salt, split-green-pea, styrofoam-bead, water, wheat, and wooden-button kept cylindrical containers. The robot performed 10 trials on each object using a tool, resulting in 5,400 interactions (6 tools x 6 behaviors x 15 objects x 10 trials). The robot records multiple sensory data (audio, RGB images, depth images, haptic, and touch images) while interacting with the objects.
The data generated from this study are grouped into 3 main types: (1) participant demographic and clinical data, (2) sensor data from the different devices, as well as clinical scores and metadata related to the tasks performed, and (3) participant diaries collected during the in-clinic and at-home phases of the study. Throughout the data tables, timestamps are provided as UNIX epoch/POSIX time.
This file contains the data and code for the publication "The Federal Reserve's Response to the Global Financial Crisis and Its Long-Term Impact: An Interrupted Time-Series Natural Experimental Analysis" by A. C. Kamkoum, 2023.
This dataset provides neutron and gamma-ray pulse signals for pulse shape discrimination experiments. Serval traditional and recently proposed pulse shape discrimination algorithms are utilized to conduct pulse shape discrimination under raw pulse signals and noise-enhanced datasets. These algorithms include zero-crossing (ZC), charge comparison (CC), falling edge percentage slope (FEPS), frequency gradient analysis (FGA), pulse-coupled neural network (PCNN), ladder gradient (LG), and heterogeneous quasi-continuous spiking cortical model (HQC-SCM). This dataset also provides the source code of all these pulse shape discrimination methods, together with the source code of schematic pulse shape discrimination performance evaluation and anti-noise performance evaluation.
ISOD contains 2,000 manually labelled RGB-D images from 20 diverse sites, each featuring over 30 types of small objects randomly placed amidst the items already present in the scenes. These objects, typically ≤3cm in height, include LEGO blocks, rags, slippers, gloves, shoes, cables, crayons, chalk, glasses, smartphones (and their cases), fake banana peels, fake pet waste, and piles of toilet paper, among others. These items were chosen because they either threaten the safe operation of indoor mobile robots or create messes if run over.
DeepGraviLens is a data set of simulated gravitational lenses consisting of images associated with brightness variation time series. In this dataset, both non-transient and transient phenomena (supernovae explosions) are simulated.
Blockchain has empowered computer systems to be more secure using a distributed network. However, the current blockchain design suffers from fairness issues in transaction ordering. Miners are able to reorder transactions to generate profits, the so-called miner extractable value (MEV). Existing research recognizes MEV as a severe security issue and proposes potential solutions, including prominent Flashbots. However, previous studies have mostly analyzed blockchain data, which might not capture the impacts of MEV in a much broader AI society. Thus, in this research, we applied natural language processing (NLP) methods to comprehensively analyze topics in tweets on MEV. We collected more than 20000 tweets with #MEV and #Flashbots hashtags and analyzed their topics. Our results show that the tweets discussed profound topics of ethical concern, including security, equity, emotional sentiments, and the desire for solutions to MEV. We also identify the co-movements of MEV activities on blo
As CryptoPunks pioneers the innovation of non-fungible tokens (NFTs) in AI and art, the valuation mechanics of NFTs has become a trending topic. Earlier research identifies the impact of ethics and society on the price prediction of CryptoPunks. Since the booming year of the NFT market in 2021, the discussion of CryptoPunks has propagated on social media. Still, existing literature hasn't considered the social sentiment factors after the historical turning point on NFT valuation. In this paper, we study how sentiments in social media, together with gender and skin tone, contribute to NFT valuations by an empirical analysis of social media, blockchain, and crypto exchange data. We evidence social sentiments as a significant contributor to the price prediction of CryptoPunks. Furthermore, we document structure changes in the valuation mechanics before and after 2021. Although people's attitudes towards Cryptopunks are primarily positive, our findings reflect imbalances in transaction act