298 machine learning datasets
298 dataset results
The dataset, generated from a scientific simulation, consists of a time series (251 steps) of 3D scalar fields on a spherical 180x201x360 grid covering 500 Myr of geological time. Each time step is 2 Myrs, and the fields are:
Replication datasets (200 million rows) used in experiments by Yancey & Settles (2020). (2019-06-11)
INSTANCE is a data collection of more than 1.3 million seismic waveforms originating from a selection of about 54,000 earthquakes occurred since 2005 in Italy and surrounding regions and seismic noise recordings randomly extracted from event free time windows of the continuous waveforms archive. The purpose is to provide reference datasets useful to develop and test seismic data processing routines based on machine learning and deep learning frameworks. The primary source of this information is ISIDe (Italian Seismological Instrumental and Parametric Data-Base) for earthquakes and the Italian node of EIDA (http://eida.ingv.it) for seismic data. All the waveforms have been sized to a 120 s window, preprocessed and resampled at 100 Hz. For each trace we provide a large number of parameters as metadata, either derived from event information or computed from trace data. Associated metadata allow for the identification of the source, the station, the path travelled by seismic waves and asse
The data presented here was extracted from a larger dataset collected through a collaboration between the Embedded Systems Laboratory (ESL) of the Swiss Federal Institute of Technology in Lausanne (EPFL), Switzerland and the Institute of Sports Sciences of the University of Lausanne (ISSUL). In this dataset, we report the extracted segments used for an analysis of R peak detection algorithms during high intensity exercise.
Multimodal object recognition is still an emerging field. Thus, publicly available datasets are still rare and of small size. This dataset was developed to help fill this void and presents multimodal data for 63 objects with some visual and haptic ambiguity. The dataset contains visual, kinesthetic and tactile (audio/vibrations) data. To completely solve sensory ambiguity, sensory integration/fusion would be required. This report describes the creation and structure of the dataset. The first section explains the underlying approach used to capture the visual and haptic properties of the objects. The second section describes the technical aspects (experimental setup) needed for the collection of the data. The third section introduces the objects, while the final section describes the structure and content of the dataset.
The data used for all results in this paper can be found here. This directory contains:
Synthetic visual inspection data of structural elements in bridges. The data is generated using the OpenIPDM toolbox "Generate Synthetic Dataset". For further details about the data generation and the properties of the dataset, refer to the software manual at https://github.com/CivML-PolyMtl/OpenIPDM/blob/main/Help
LARa is the first freely accessible logistics-dataset for human activity recognition. In the ’Innovationlab Hybrid Services in Logistics’ at TU Dortmund University, two picking and one packing scenarios with 14 subjects were recorded using OMoCap, IMUs, and an RGB camera. 758 minutes of recordings were labeled by 12 annotators in 474 person-hours. The subsequent revision was carried out by 4 revisers in 143 person-hours. All the given data have been labeled and categorised into 8 activity classes and 19 binary coarse-semantic descriptions, also called attributes.
The CIP dataset is composed of 2 subsets, containing low-cost (MPU9250) and high-end (MTwAwinda) Magnetic, Angular Rate, and Gravity (MARG) sensor data respectively. It provides data for the analysis of the complete inertial pose pipeline, from raw measurements, to sensor-to-segment calibration, multi-sensor fusion, skeleton kinematics, to the complete human pose. Multiple trials were collected with 21 and 10 subjects respectively, performing 6 types of movements (ranging from calibration, to daily-activities, range-of-motion and random). It presents a high degree of variability and complex dynamics while containing common sources of error found on real conditions. This amounts to 3.5M samples, synchronized with a ground-truth inertial motion capture system (Xsens) at 60hz. This dataset may contribute to assess, benchmark and develop novel algorithms for each of the pipelines' processing steps, with applications in classic or data-driven inertial pose estimation algorithms, human movem
Context of the data sets The Zooniverse platform (www.zooniverse.org) has successfully built a large community of volunteers contributing to citizen science projects. Galaxy Zoo and the Milky Way Project were hosted there.
The CANDOR corpus is a large, novel, multimodal corpus of 1,656 recorded conversations in spoken English. This 7+ million word, 850 hour corpus totals over 1TB of audio, video, and transcripts, with moment-to-moment measures of vocal, facial, and semantic expression, along with an extensive survey of speaker post conversation reflections.
This dataset accompanies the linked SerialTrack paper and provides test case data (2D/3D, varying particle density) across a range of synthetic and experimental imaging modalities. Included test cases can be used for further code development, validation of and comparisons for existing particle tracking codes, and/or evaluating and learning to use our SerialTrack code on known data.
Hello Watt collects power usage data at a resolution of 30 minutes. To develop and test our disaggregation methods we consider a subsample consisting of power consumption of 5k households with off-peak pricing contracts for one month. In addition to the type of their water heating, some users also provide such metadata as the home surface area, and the number of inhabitants.
Forty prismatic lithium-ion pouch cells were built at the University of Michigan Battery Laboratory. The cells have a nominal capacity of 2.36Ah and comprise a NCM111 cathode and graphite anode. Cells were formed using two different formation protocols: "fast formation" and "baseline formation". After formation, cells were put under cycle life testing at room temperature and 45degC. Cells were cycled until the discharge capacities dropped below 50% of the initial capacities. Data was collected by the cycler equipment (Maccor) during both the formation process as well as during the cycling test. Data was processed in the Voltaiq software and subsequently exported as .csv files.
The dataset provided is a collection of real-world industrial vibration data collected from a brownfield CNC milling machine. The acceleration has been measured using a tri-axial accelerometer (Bosch CISS Sensor) mounted inside the machine. The X- Y- and Z-axes of the accelerometer have been recorded using a sampling rate equal to 2 kHz. Thereby normal as well as anomalous data have been collected for 4 different timeframes, each lasting 5 months from February 2019 until August 2021 and labelled accordingly. It can be used to investigate the scalability of models and research process variations as the anomaly impact differs. In total there is data from three different CNC milling machines each executing 15 processes. For a detailed description of the data and experimental set-up, please refer to the paper: https://doi.org/10.1016/j.procir.2022.04.022
Main Dataset city_pollution_data.csv
Sequence Consistency Evaluation (SCE) consists of a benchmark task for sequence consistency evaluation (SCE).
This dataset contains data of 125 1-hour simulations of ship motion during various sea states performing random maneuvers in 4 degrees of freedom (surge-sway-yaw-roll). The original ship is a patrol ship developed by Perez et al. 1. We have extended it with a set of two symmetrically placed rudder propellers. Additionally, we simulate wind forces according to Isherwood's wind model 2. Wind-induced waves are generated with the JONSWAP spectrum 3 and the corresponding wave forces are then computed using wave force response amplitude operators (ROA).
Hyperquack v.2 response data which contains structured data records in JSON.
In this dataset two robots, Baxter and UR5, perform 8 behaviors (look, grasp, pick, hold, shake, lower, drop, and push) on 95 objects that vary by 5 color (blue, green, red, white, and yellow), 6 contents (wooden button, plastic dices, glass marbles, nuts & bolts, pasta, and rice), and 4 weights (empty, 50g, 100g, and 150g). There are 90 objects with contents (5 colors x 3 weights x 6 contents) and 5 objects without any content that only vary by 5 colors. Both robots perform 5 trials on each object, resulting in 7,600 interactions (2 robots x 8 behaviors x 95 objects x 5 trials