TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

298 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

298 dataset results

Earth’s Mantle Convection

The dataset, generated from a scientific simulation, consists of a time series (251 steps) of 3D scalar fields on a spherical 180x201x360 grid covering 500 Myr of geological time. Each time step is 2 Myrs, and the fields are:

1 papers0 benchmarks3D, Time series

Duolingo Bandit Notifications

Replication datasets (200 million rows) used in experiments by Yancey & Settles (2020). (2019-06-11)

1 papers0 benchmarksTime series

INSTANCE (the Italian seismic dataset for machine learning)

INSTANCE is a data collection of more than 1.3 million seismic waveforms originating from a selection of about 54,000 earthquakes occurred since 2005 in Italy and surrounding regions and seismic noise recordings randomly extracted from event free time windows of the continuous waveforms archive. The purpose is to provide reference datasets useful to develop and test seismic data processing routines based on machine learning and deep learning frameworks. The primary source of this information is ISIDe (Italian Seismological Instrumental and Parametric Data-Base) for earthquakes and the Italian node of EIDA (http://eida.ingv.it) for seismic data. All the waveforms have been sized to a 120 s window, preprocessed and resampled at 100 Hz. For each trace we provide a large number of parameters as metadata, either derived from event information or computed from trace data. Associated metadata allow for the identification of the source, the station, the path travelled by seismic waves and asse

1 papers0 benchmarksTime series

ECG in High Intensity Exercise Dataset

The data presented here was extracted from a larger dataset collected through a collaboration between the Embedded Systems Laboratory (ESL) of the Swiss Federal Institute of Technology in Lausanne (EPFL), Switzerland and the Institute of Sports Sciences of the University of Lausanne (ISSUL). In this dataset, we report the extracted segments used for an analysis of R peak detection algorithms during high intensity exercise.

1 papers0 benchmarksBiomedical, Time series

AU Dataset for Visuo-Haptic Object Recognition for Robots

Multimodal object recognition is still an emerging field. Thus, publicly available datasets are still rare and of small size. This dataset was developed to help fill this void and presents multimodal data for 63 objects with some visual and haptic ambiguity. The dataset contains visual, kinesthetic and tactile (audio/vibrations) data. To completely solve sensory ambiguity, sensory integration/fusion would be required. This report describes the creation and structure of the dataset. The first section explains the underlying approach used to capture the visual and haptic properties of the objects. The second section describes the technical aspects (experimental setup) needed for the collection of the data. The third section introduces the objects, while the final section describes the structure and content of the dataset.

1 papers0 benchmarksImages, Tabular, Time series

Drosophila Immunity Time-Course Data

The data used for all results in this paper can be found here. This directory contains:

1 papers0 benchmarksBiology, Tabular, Time series

Synthetic Visual Inspections

Synthetic visual inspection data of structural elements in bridges. The data is generated using the OpenIPDM toolbox "Generate Synthetic Dataset". For further details about the data generation and the properties of the dataset, refer to the software manual at https://github.com/CivML-PolyMtl/OpenIPDM/blob/main/Help

1 papers0 benchmarksTime series

LARa (Logistic Activity Recognition Challenge)

LARa is the first freely accessible logistics-dataset for human activity recognition. In the ’Innovationlab Hybrid Services in Logistics’ at TU Dortmund University, two picking and one packing scenarios with 14 subjects were recorded using OMoCap, IMUs, and an RGB camera. 758 minutes of recordings were labeled by 12 annotators in 474 person-hours. The subsequent revision was carried out by 4 revisers in 143 person-hours. All the given data have been labeled and categorised into 8 activity classes and 19 binary coarse-semantic descriptions, also called attributes.

1 papers0 benchmarksActions, Time series

CIP (Complete Inertial Pose)

The CIP dataset is composed of 2 subsets, containing low-cost (MPU9250) and high-end (MTwAwinda) Magnetic, Angular Rate, and Gravity (MARG) sensor data respectively. It provides data for the analysis of the complete inertial pose pipeline, from raw measurements, to sensor-to-segment calibration, multi-sensor fusion, skeleton kinematics, to the complete human pose. Multiple trials were collected with 21 and 10 subjects respectively, performing 6 types of movements (ranging from calibration, to daily-activities, range-of-motion and random). It presents a high degree of variability and complex dynamics while containing common sources of error found on real conditions. This amounts to 3.5M samples, synchronized with a ground-truth inertial motion capture system (Xsens) at 60hz. This dataset may contribute to assess, benchmark and develop novel algorithms for each of the pipelines' processing steps, with applications in classic or data-driven inertial pose estimation algorithms, human movem

1 papers0 benchmarksTime series

Volunteer task execution events in Galaxy Zoo and The Milky Way citizen science projects

Context of the data sets The Zooniverse platform (www.zooniverse.org) has successfully built a large community of volunteers contributing to citizen science projects. Galaxy Zoo and the Milky Way Project were hosted there.

1 papers0 benchmarksActions, Tabular, Time series

CANDOR Corpus (CANDOR = Conversation: A Naturalistic Dataset of Online Recordings)

The CANDOR corpus is a large, novel, multimodal corpus of 1,656 recorded conversations in spoken English. This 7+ million word, 850 hour corpus totals over 1TB of audio, video, and transcripts, with moment-to-moment measures of vocal, facial, and semantic expression, along with an extensive survey of speaker post conversation reflections.

1 papers0 benchmarksImages, Tabular, Texts, Time series, Videos

SerialTrack Particle Image Dataset

This dataset accompanies the linked SerialTrack paper and provides test case data (2D/3D, varying particle density) across a range of synthetic and experimental imaging modalities. Included test cases can be used for further code development, validation of and comparisons for existing particle tracking codes, and/or evaluating and learning to use our SerialTrack code on known data.

1 papers0 benchmarks3D, Images, Time series

Hello Watt (Hello Watt electricity consumption curves)

Hello Watt collects power usage data at a resolution of 30 minutes. To develop and test our disaggregation methods we consider a subsample consisting of power consumption of 5k households with off-peak pricing contracts for one month. In addition to the type of their water heating, some users also provide such metadata as the home surface area, and the number of inhabitants.

1 papers0 benchmarksTime series

Battery test data - fast formation study

Forty prismatic lithium-ion pouch cells were built at the University of Michigan Battery Laboratory. The cells have a nominal capacity of 2.36Ah and comprise a NCM111 cathode and graphite anode. Cells were formed using two different formation protocols: "fast formation" and "baseline formation". After formation, cells were put under cycle life testing at room temperature and 45degC. Cells were cycled until the discharge capacities dropped below 50% of the initial capacities. Data was collected by the cycler equipment (Maccor) during both the formation process as well as during the cycling test. Data was processed in the Voltaiq software and subsequently exported as .csv files.

1 papers0 benchmarksTime series

Bosch CNC Machining Dataset

The dataset provided is a collection of real-world industrial vibration data collected from a brownfield CNC milling machine. The acceleration has been measured using a tri-axial accelerometer (Bosch CISS Sensor) mounted inside the machine. The X- Y- and Z-axes of the accelerometer have been recorded using a sampling rate equal to 2 kHz. Thereby normal as well as anomalous data have been collected for 4 different timeframes, each lasting 5 months from February 2019 until August 2021 and labelled accordingly. It can be used to investigate the scalability of models and research process variations as the anomaly impact differs. In total there is data from three different CNC milling machines each executing 15 processes. For a detailed description of the data and experimental set-up, please refer to the paper: https://doi.org/10.1016/j.procir.2022.04.022

1 papers0 benchmarksTime series

DEAP City Dataset

Main Dataset city_pollution_data.csv

1 papers0 benchmarksEnvironment, Graphs, Tabular, Time series

Sequence Consistency Evaluation (SCE) tests

Sequence Consistency Evaluation (SCE) consists of a benchmark task for sequence consistency evaluation (SCE).

1 papers0 benchmarksImages, Time series

A Simulated 4-DOF Ship Motion Dataset for System Identification under Environmental Disturbances

This dataset contains data of 125 1-hour simulations of ship motion during various sea states performing random maneuvers in 4 degrees of freedom (surge-sway-yaw-roll). The original ship is a patrol ship developed by Perez et al. 1. We have extended it with a set of two symmetrically placed rudder propellers. Additionally, we simulate wind forces according to Isherwood's wind model 2. Wind-induced waves are generated with the JONSWAP spectrum 3 and the corresponding wave forces are then computed using wave force response amplitude operators (ROA).

1 papers0 benchmarksTime series

Censored_Planet_Quack (Censored Planet HyperQuack Echo)

Hyperquack v.2 response data which contains structured data records in JSON.

1 papers0 benchmarksTime series

Baxter-UR5_95-Objects

In this dataset two robots, Baxter and UR5, perform 8 behaviors (look, grasp, pick, hold, shake, lower, drop, and push) on 95 objects that vary by 5 color (blue, green, red, white, and yellow), 6 contents (wooden button, plastic dices, glass marbles, nuts & bolts, pasta, and rice), and 4 weights (empty, 50g, 100g, and 150g). There are 90 objects with contents (5 colors x 3 weights x 6 contents) and 5 objects without any content that only vary by 5 colors. Both robots perform 5 trials on each object, resulting in 7,600 interactions (2 robots x 8 behaviors x 95 objects x 5 trials

1 papers0 benchmarksActions, Audio, Images, Interactive, RGB Video, RGB-D, Time series, Videos
PreviousPage 9 of 15Next