Datasets

271 machine learning datasets

271 dataset results

iV2V and iV2I+ (AI4Mobile Industrial Wireless Datasets: iV2V and iV2I+)

This dataset provides wireless measurements from two industrial testbeds: iV2V (industrial Vehicle-to-Vehicle) and iV2I+ (industrial Vehicular-to-Infrastructure plus sensor).

1 papers0 benchmarksLiDAR, Point cloud, Tabular, Time series

Binette's 2022 Inventors Benchmark

Hand-disambiguation of a sample of U.S. patents inventor mentions from PatentsView.org.

1 papers0 benchmarksTabular

Poisoned Water Detection using Smartphone embedded WiFi CSI data and Machine Learning Algorithms (Dataset and machine learning algorithms to detect poisoned water from clean water via using Smartphone embedded Wi-Fi CSI data.)

This repository contains a dataset and machine learning algorithms to detect poisoned water from clean water via using equivalent Smartphone embedded Wi-Fi CSI data.

1 papers0 benchmarksTables, Tabular, Time series

Regensburg Pediatric Appendicitis Dataset

This dataset was acquired in a retrospective study from a cohort of pediatric patients admitted with abdominal pain to Children’s Hospital St. Hedwig in Regensburg, Germany. Multiple abdominal B-mode ultrasound images were acquired for most patients, with the number of views varying from 1 to 15. The images depict various regions of interest, such as the abdomen’s right lower quadrant, appendix, intestines, lymph nodes and reproductive organs. Alongside multiple US images for each subject, the dataset includes information encompassing laboratory tests, physical examination results, clinical scores, such as Alvarado and pediatric appendicitis scores, and expert-produced ultrasonographic findings. Lastly, the subjects were labeled w.r.t. three target variables: diagnosis (appendicitis vs. no appendicitis), management (surgical vs. conservative) and severity (complicated vs. uncomplicated or no appendicitis). The study was approved by the Ethics Committee of the University of Regensburg (

1 papers0 benchmarksImages, Tabular

Uncertainty and Concept Drift (On the Connection between Concept Drift and Uncertainty in Industrial Artificial Intelligence)

AI-based digital twins are at the leading edge of theIndustry 4.0 revolution, which are technologically empowered bythe Internet of Things and real-time data analysis. Information collected from industrial assets is produced in a continuous fashion, yielding data streams that must be processed under stringent timing constraints. Such data streams are usually subject to non-stationary phenomena, causing that the data distribution of the streams may change, and thus the knowledge captured by models used for data analysis may become obsolete (leading to the so-called concept drift effect). The early detection of thechange (drift) is crucial for updating the model’s knowledge, which is challenging especially in scenarios where the ground truth associated to the stream data is not readily available. Among many other techniques, the estimation of the model’s confidence has been timidly suggested in a few studies as a criterion for detecting drifts in unsupervised settings. The goal of this m

1 papers0 benchmarksTabular

international faces

"The Chicago Face Database was developed at the University of Chicago by Debbie S. Ma, Joshua Correll, and Bernd Wittenbrink. The CFD is intended for use in scientific research. It provides high-resolution, standardized photographs of male and female faces of varying ethnicity between the ages of 17-65. Extensive norming data are available for each individual model. These data include both physical attributes (e.g., face size) as well as subjective ratings by independent judges (e.g., attractiveness).

1 papers0 benchmarksImages, Tabular

WDC Block (WDC Block: A Blocking Benchmark)

WDC Block is a benchmark for comparing the performance of blocking methods that are used as part of entity resolution pipelines.

1 papers0 benchmarksTabular

Multicenter dataset of simulated neuroimaging features - quadratic relationship with age

A detailed description of this dataset can be found in the Zenodo repository: https://zenodo.org/record/8119042#.ZK-jJC9BxhE

1 papers0 benchmarksTabular

Multicenter dataset of neuroimaging features (part I)

A detailed description of this dataset can be found in the Zenodo repository: https://zenodo.org/record/7845311#.ZK-jty9BxhE

1 papers0 benchmarksTabular

Multicenter dataset of neuroimaging features (part II)

A detailed description of this dataset can be found in the Zenodo repository: https://zenodo.org/record/7845361#.ZK-k7y9BxhE

1 papers0 benchmarksTabular

Can you predict product backorder?

Problem Statement

1 papers0 benchmarksTables, Tabular

OTTO Recommender Systems Dataset

The OTTO session dataset is a large-scale dataset intended for multi-objective recommendation research. We collected the data from anonymized behavior logs of the OTTO webshop and the app. The mission of this dataset is to serve as a benchmark for session-based recommendations and foster research in the multi-objective and session-based recommender systems area. We also launched a Kaggle competition with the goal to predict clicks, cart additions, and orders based on previous events in a user session.

1 papers0 benchmarksTabular

Dataset of a Study of Computational reproducibility of Jupyter notebooks from biomedical publications

The dataset is generated from the study of computational reproducibility of Jupyter notebooks from biomedical publications. Our focus lies in evaluating the extent of reproducibility of Jupyter notebooks derived from GitHub repositories linked to publications present in the biomedical literature repository, PubMed Central. We analyzed the reproducibility of Jupyter notebooks from GitHub repositories associated with publications indexed in the biomedical literature repository PubMed Central. The dataset includes the metadata information of the journals, publications, the Github repositories mentioned in the publications and the notebooks present in the Github repositories.

1 papers0 benchmarksImages, Tables, Tabular

Dataset of a Study of Computational reproducibility of Jupyter notebooks from biomedical publications version 1 (Version 1)

This repository contains the dataset for the study of the computational reproducibility of Jupyter notebooks from biomedical publications. We analyzed the reproducibility of Jupyter notebooks from GitHub repositories associated with publications indexed in the biomedical literature repository PubMed Central. The dataset includes the metadata information of the journals, publications, the Github repositories mentioned in the publications and the notebooks present in the Github repositories.

1 papers0 benchmarksImages, Tables, Tabular

Pylon Benchmark (Pylon Table Union Search Benchmark)

We create a new dataset from GitTables, a data lake of 1.7M tables extracted from CSV files on GitHub. The benchmark comprises 1,746 tables including union-able table subsets under topics selected from Schema.org: scholarly article, job posting, and music playlist. We end up with these three topics since we can find a fair number of union-able tables of them from diverse sources in the corpus (we can easily find union-able tables from a single source but they are less interesting for table union search as simple syntactic methods can identify all of them because of the same schema and consistent value representations).

1 papers0 benchmarksTabular

List of OWL reasoners

CSV file with a list of all examined OWL reasoners. For each item, information on usability and maintenance status, project pages, source code repositories and related documentation was gathered.

1 papers0 benchmarksTabular

OPFLearnData (OPFLearnData: Dataset for Learning AC Optimal Power Flow)

The datasets are resulting from OPFLearn.jl, a Julia package for creating AC OPF datasets. The package was developed to provide researchers with a standardized way to efficiently create AC OPF datasets that are representative of more of the AC OPF feasible load space compared to typical dataset creation methods. The OPFLearn dataset creation method uses a relaxed AC OPF formulation to reduce the volume of the unclassified input space throughout the dataset creation process. The dataset contains load profiles and their respective optimal primal and dual solutions. Load samples are processed using AC OPF formulations from PowerModels.jl. More information on the dataset creation method can be found in our publication, "OPF-Learn: An Open-Source Framework for Creating Representative AC Optimal Power Flow Datasets" and in the package website: https://github.com/NREL/OPFLearn.jl.

1 papers0 benchmarksTabular

PreviousPage 9 of 14Next

Datasets

iV2V and iV2I+ (AI4Mobile Industrial Wireless Datasets: iV2V and iV2I+)

Binette's 2022 Inventors Benchmark

Poisoned Water Detection using Smartphone embedded WiFi CSI data and Machine Learning Algorithms (Dataset and machine learning algorithms to detect poisoned water from clean water via using Smartphone embedded Wi-Fi CSI data.)

Regensburg Pediatric Appendicitis Dataset

Uncertainty and Concept Drift (On the Connection between Concept Drift and Uncertainty in Industrial Artificial Intelligence)

international faces

WDC Block (WDC Block: A Blocking Benchmark)

Multicenter dataset of simulated neuroimaging features - quadratic relationship with age

Multicenter dataset of neuroimaging features (part I)

Multicenter dataset of neuroimaging features (part II)

Can you predict product backorder?

OTTO Recommender Systems Dataset

Dataset of a Study of Computational reproducibility of Jupyter notebooks from biomedical publications

Dataset of a Study of Computational reproducibility of Jupyter notebooks from biomedical publications version 1 (Version 1)

Pylon Benchmark (Pylon Table Union Search Benchmark)

List of OWL reasoners

OPFLearnData (OPFLearnData: Dataset for Learning AC Optimal Power Flow)

Dataset of Paper Corpus

Multi-Labelled SMILES Odors dataset

Supplementary Material (Annotation Table of Review)

Datasets

iV2V and iV2I+ (AI4Mobile Industrial Wireless Datasets: iV2V and iV2I+)

Binette's 2022 Inventors Benchmark

Poisoned Water Detection using Smartphone embedded WiFi CSI data and Machine Learning Algorithms (Dataset and machine learning algorithms to detect poisoned water from clean water via using Smartphone embedded Wi-Fi CSI data.)

Regensburg Pediatric Appendicitis Dataset

Uncertainty and Concept Drift (On the Connection between Concept Drift and Uncertainty in Industrial Artificial Intelligence)

international faces

WDC Block (WDC Block: A Blocking Benchmark)

Multicenter dataset of simulated neuroimaging features - quadratic relationship with age

Multicenter dataset of neuroimaging features (part I)

Multicenter dataset of neuroimaging features (part II)

Can you predict product backorder?

OTTO Recommender Systems Dataset

Dataset of a Study of Computational reproducibility of Jupyter notebooks from biomedical publications

Dataset of a Study of Computational reproducibility of Jupyter notebooks from biomedical publications version 1 (Version 1)

Pylon Benchmark (Pylon Table Union Search Benchmark)

List of OWL reasoners

OPFLearnData (OPFLearnData: Dataset for Learning AC Optimal Power Flow)

Dataset of Paper Corpus

Multi-Labelled SMILES Odors dataset

Supplementary Material (Annotation Table of Review)