Papers With Code 2 | ML Benchmarks, SotA Results & Code

Deep Sea Treasure Pareto-Front

The dataset contains two Pareto-fronts: - The Pareto-front for the 2-objective problem - The Pareto-front for the 3-objective problem

1 papers0 benchmarksTables

washed_contract

Dataset contains about 48K contracts which are open source on Etherscan.

1 papers0 benchmarksTables

Industrial Benchmark Dataset for Customer Escalation Prediction

This is a real-world industrial benchmark dataset from a major medical device manufacturer for the prediction of customer escalations. The dataset contains features derived from IoT (machine log) and enterprise data including labels for escalation from a fleet of thousands of customers of high-end medical devices.

1 papers0 benchmarksTables

Rice Dataset Commeo and Osmancik

ata Set Name: Rice Dataset (Commeo and Osmancik) Abstract: A total of 3810 rice grain's images were taken for the two species (Cammeo and Osmancik), processed and feature inferences were made. 7 morphological features were obtained for each grain of rice.

1 papers0 benchmarksTables, Tabular

Acoustic Extinguisher Fire Dataset

Yavuz Selim TASPINAR, Murat KOKLU and Mustafa ALTIN

1 papers0 benchmarksTables

wildFireClimateChangeTweets

Here I provided the datasets I used for this analysis. It includes the tweets I streamed using the Tweepy package on Python during the peach of the wildfire season in late summer/early fall of 2020.

1 papers0 benchmarksTables, Texts

Replication Data for: Investigating the concentration of High Yield Investment Programs in the United Kingdom

The dataset provides information about 450 HYIPs collected between November 2020 and September 2021. This dataset was analyzed and the results are discussed in the paper.

1 papers0 benchmarksTables

Water Footprint Recommender System Data

It contains data from two different realities: Food.com, a well-known American recipe site, and Planeat, an Italian site that allows you to plan recipes to save food waste. The dataset is divided into two parts: embeddings, which can be used directly to execute the work and receive suggestions, and raw data, which must first be processed into embeddings.

1 papers0 benchmarksTables, Texts

SRSD-Feynman (Easy set)

Our SRSD (Feynman) datasets are designed to discuss the performance of Symbolic Regression for Scientific Discovery. We carefully reviewed the properties of each formula and its variables in the Feynman Symbolic Regression Database to design reasonably realistic sampling range of values so that our SRSD datasets can be used for evaluating the potential of SRSD such as whether or not an SR method con (re)discover physical laws from such datasets.

1 papers0 benchmarksTables, Tabular

SRSD-Feynman (Hard set)

Our SRSD (Feynman) datasets are designed to discuss the performance of Symbolic Regression for Scientific Discovery. We carefully reviewed the properties of each formula and its variables in the Feynman Symbolic Regression Database to design reasonably realistic sampling range of values so that our SRSD datasets can be used for evaluating the potential of SRSD such as whether or not an SR method con (re)discover physical laws from such datasets.

1 papers0 benchmarksTables, Tabular

SRSD-Feynman (Medium set)

Our SRSD (Feynman) datasets are designed to discuss the performance of Symbolic Regression for Scientific Discovery. We carefully reviewed the properties of each formula and its variables in the Feynman Symbolic Regression Database to design reasonably realistic sampling range of values so that our SRSD datasets can be used for evaluating the potential of SRSD such as whether or not an SR method con (re)discover physical laws from such datasets.

1 papers0 benchmarksTables, Tabular

kaggle stroke Prediction competition

It is a competition on kaggle with stroke Prediction, which is heavily imbalanced.

1 papers0 benchmarksMedical, Tables

Poisoned Water Detection using Smartphone embedded WiFi CSI data and Machine Learning Algorithms (Dataset and machine learning algorithms to detect poisoned water from clean water via using Smartphone embedded Wi-Fi CSI data.)

This repository contains a dataset and machine learning algorithms to detect poisoned water from clean water via using equivalent Smartphone embedded Wi-Fi CSI data.

1 papers0 benchmarksTables, Tabular, Time series

Statcan Dialogue Dataset

The StatCan Dialogue Dataset: Retrieving Data Tables through Conversations with Genuine Intents

1 papers1 benchmarksTables, Texts

bSDD (buildingSMART Data Dictionary)

The buildingSMART Data Dictionary (bSDD) is an online service that hosts classifications and their properties, allowed values, units and translations. The bSDD allows linking between all the content inside the database. It provides a standardized workflow to guarantee data quality and information consistency.

1 papers0 benchmarksCad, Tables, Texts

data_qe (Federal Reserve Quantitative Easing Data)

This file contains the data and code for the publication "The Federal Reserve's Response to the Global Financial Crisis and Its Long-Term Impact: An Interrupted Time-Series Natural Experimental Analysis" by A. C. Kamkoum, 2023.

1 papers0 benchmarksGraphs, Tables, Time series

Can you predict product backorder?

Problem Statement

1 papers0 benchmarksTables, Tabular

MineralImage5k (Benchmark for 5k raw mineral species recognition)

We present a comprehensive dataset comprising a vast collection of raw mineral samples for the purpose of mineral recognition. The dataset encompasses more than 5,000 distinct mineral species and incorporates subsets for zero-shot and few-shot learning. In addition to the samples themselves, some entries in the dataset are accompanied by supplementary natural language descriptions, size measurements, and segmentation masks. For detailed information on each sample, please refer to the minerals_full.csv file.

1 papers0 benchmarksImages, Tables, Texts

Notebook Inaccessibility

This dataset artifact contains the intermediate datasets from pipeline executions necessary to reproduce the results of the paper. We share this artifact in hopes of providing a starting point for other researchers to extend the analysis on notebooks, discover more about their accessibility, and offer solutions to make data science more accessible. The scripts needed to generate these datasets and analyse them are shared in the Github Repository for this work.

1 papers0 benchmarksTables

Dataset of a Study of Computational reproducibility of Jupyter notebooks from biomedical publications

The dataset is generated from the study of computational reproducibility of Jupyter notebooks from biomedical publications. Our focus lies in evaluating the extent of reproducibility of Jupyter notebooks derived from GitHub repositories linked to publications present in the biomedical literature repository, PubMed Central. We analyzed the reproducibility of Jupyter notebooks from GitHub repositories associated with publications indexed in the biomedical literature repository PubMed Central. The dataset includes the metadata information of the journals, publications, the Github repositories mentioned in the publications and the notebooks present in the Github repositories.

1 papers0 benchmarksImages, Tables, Tabular

Datasets

Deep Sea Treasure Pareto-Front

washed_contract

Industrial Benchmark Dataset for Customer Escalation Prediction

Rice Dataset Commeo and Osmancik

Acoustic Extinguisher Fire Dataset

wildFireClimateChangeTweets

Replication Data for: Investigating the concentration of High Yield Investment Programs in the United Kingdom

Water Footprint Recommender System Data

SRSD-Feynman (Easy set)

SRSD-Feynman (Hard set)

SRSD-Feynman (Medium set)

kaggle stroke Prediction competition

Poisoned Water Detection using Smartphone embedded WiFi CSI data and Machine Learning Algorithms (Dataset and machine learning algorithms to detect poisoned water from clean water via using Smartphone embedded Wi-Fi CSI data.)

Statcan Dialogue Dataset

bSDD (buildingSMART Data Dictionary)

data_qe (Federal Reserve Quantitative Easing Data)

Can you predict product backorder?

MineralImage5k (Benchmark for 5k raw mineral species recognition)

Notebook Inaccessibility

Dataset of a Study of Computational reproducibility of Jupyter notebooks from biomedical publications

Datasets

Deep Sea Treasure Pareto-Front

washed_contract

Industrial Benchmark Dataset for Customer Escalation Prediction

Rice Dataset Commeo and Osmancik

Acoustic Extinguisher Fire Dataset

wildFireClimateChangeTweets

Replication Data for: Investigating the concentration of High Yield Investment Programs in the United Kingdom

Water Footprint Recommender System Data

SRSD-Feynman (Easy set)

SRSD-Feynman (Hard set)

SRSD-Feynman (Medium set)

kaggle stroke Prediction competition

Poisoned Water Detection using Smartphone embedded WiFi CSI data and Machine Learning Algorithms (Dataset and machine learning algorithms to detect poisoned water from clean water via using Smartphone embedded Wi-Fi CSI data.)

Statcan Dialogue Dataset

bSDD (buildingSMART Data Dictionary)

data_qe (Federal Reserve Quantitative Easing Data)

Can you predict product backorder?

MineralImage5k (Benchmark for 5k raw mineral species recognition)

Notebook Inaccessibility

Dataset of a Study of Computational reproducibility of Jupyter notebooks from biomedical publications