Datasets

271 machine learning datasets

271 dataset results

Single Point Corn Yield Data (Single Point Corn Yield Data - Weather, Soil, Cultivation Area, and Yield for Precision Agriculture)

This data comprises processed weather, soil, yield, and cultivation area for corn yield prediction in Sub-Sahara Africa, with emphasis on Nigeria. The data was collected to design a corn yield prediction model to help smallholder farmers make smart farming decisions. However, the data can serve several other purposes through analysis and interpretation.

1 papers0 benchmarksTabular

TML1M (Table-MovieLens1M)

Table-MovieLens1M (TML1M) is a relational table dataset derived from the classical MovieLens1M dataset. It consists of three tables: users, movies, and ratings. Notably, the movie table has been enriched with more comprehensive features. Additionally, the dataset defines a standard classification task focused on predicting user age ranges.

1 papers1 benchmarksTabular

TLF2K (Table-LastFm2K)

Table-LastFm2K (TLF2K) is a relational table dataset derived from the classical LastFM2K dataset. It contains three tables: artists, user_artists, and user_friends. Notably, the artists table has been enhanced with more detailed features, and the tags for each artist have been streamlined. The dataset also provides a standard classification task for music genre classification of artists.

1 papers1 benchmarksTabular

TACM12K (Table-ACM12K)

Table-ACM12K (TACM12K) is a relational table dataset derived from the ACM heterogeneous graph dataset. It includes four tables: papers, authors, citations, and writings. The paper table features attributes such as year, title, and abstract, while the author table includes name and affiliation details. Additionally, some feature completion has been performed for the papers. The dataset also defines a standard classification task for predicting the conference to which a paper belongs.

1 papers1 benchmarksTabular

Dataset for algorithmic thinking skills assessment: Results from the virtual CAT pilot study in Swiss compulsory education

Overview This dataset was collected during a pilot study that evaluated the virtual Cross Array Task (CAT) platform as an assessment tool for algorithmic thinking (AT) skills among K-12 students in Swiss compulsory education. As algorithmic thinking becomes increasingly vital in our digital age, this study bridges the gap between traditional assessments and the needs of today's learners by introducing a digital platform. The virtual CAT, a digital adaptation of an unplugged assessment activity, offers scalable, automated assessments with reduced human intervention.

1 papers0 benchmarksTabular

An Advanced Guide To Trade Policy Analysis

This R package, documented in a very similar way to the book R4DS, provides functions to replicate the original Stata results from the book An Advanced Guide to Trade Policy Analysis.

1 papers0 benchmarksTabular

RSM-based multi-objective optimization using desirability functions

The following files contains the simulation inputs and outputs for conducting the multi-objetive optimization of thermal comfort and dyalight with the Response Surface Methodology. This files feed are needed for running the R script/code as well as the datasets are contained in the Github repository.

1 papers0 benchmarksTabular

SemTabNet

Dataset Card for SemTabNet This dataset accompanies the following paper:

1 papers1 benchmarksTables, Tabular, Texts

RClicks

We conducted a large crowdsourcing study of click patterns in an interactive segmentation scenario and collected 475K real-user clicks. Drawing on ideas from saliency tasks, we develop a clickability model that enables sampling clicks, which closely resemble actual user inputs. Using our model and dataset, we propose RClicks benchmark for a comprehensive comparison of existing interactive segmentation methods on realistic clicks. Specifically, we evaluate not only the average quality of methods, but also the robustness w.r.t. click patterns.

1 papers0 benchmarksActions, Images, Interactive, Tables, Tabular

Completion norms for 3085 English sentence contexts

In everyday language processing, sentence context affects how readers and listeners process upcoming words. In experimental situations, it can be useful to identify words that are predicted to greater or lesser degrees by the preceding context. Here we report completion norms for 3085 English sentences, collected online using a written cloze procedure in which participants were asked to provide their best guess for the word completing a sentence. Sentences varied between 8–10 words in length. At least 100 unique participants contributed to each sentence. All responses were reviewed by human raters to mitigate the influence of mis-spellings and typographical errors. The responses provide a range of predictability values for 13,438 unique target words, 6,790 of which appear in more than one sentence context. We also provide entropy values based on the relative predictability of multiple responses. Finally, we provide the code used to collate and organize the responses to facilitate addit

1 papers0 benchmarksTabular

MVX (Multimodal V2X)

MVX incorporates realistic physical world simulation with a differentiable accurate ray tracing wireless simulation that includes multi-agent and multimodal datasets for AI-driven digital twin applications in vehicular communication systems.

1 papers1 benchmarksImages, LiDAR, Tabular, Videos

41598_2022_22531_MOESM2_ESM.xlsx

The datasets used and analysed from the glucose clamp study are available in this Excel file. They include pseudonymised information on the participants, somatometric data, biomarkers of lipid metabolism and parameters of insulin-glucose homeostasis, i.e. concentrations of insulin, glucose and c-peptide as well as data from glucose-clamp experiments, HOMA, SPINA Carb parameters (SPINA-GBeta and SPINA-GR), Matsuda index, insulinogenic index, disposition index and McAuley index.

1 papers0 benchmarksBiomedical, Medical, Tabular, Time series

41598_2022_22531_MOESM1_ESM.dif

The datasets used and analysed from the glucose clamp study are available in this DIF file. They include pseudonymised information on the participants, somatometric data, biomarkers of lipid metabolism and parameters of insulin-glucose homeostasis, i.e. concentrations of insulin, glucose and c-peptide as well as data from glucose-clamp experiments, HOMA, SPINA Carb parameters (SPINA-GBeta and SPINA-GR), Matsuda index, insulinogenic index, disposition index and McAuley index.

1 papers0 benchmarksBiomedical, Medical, Tabular, Time series

Twitter job title prediction

We introduce a dataset consisting of 1314 samples, including users’ tweets and bios. The user’s job title is found using Wikipedia crawling. The challenge of multiple job titles per user is handled using a semantic word embedding and clustering method. Then, a job prediction method is introduced based on a deep neural network and TF-IDF word embedding. We also use hashtags and emojis in the tweets for job prediction. Results show that the job title of users in Twitter could be well predicted with 54% accuracy in nine categories.

1 papers0 benchmarksTables, Tabular, Texts

Training data for "Harnessing Machine Learning for Single-Shot Measurement of Free Electron Laser Pulse Power"

This repository contains data for the NeurIPS conference paper titled "Harnessing Machine Learning for Single-Shot Measurement of Free Electron Laser Pulse Power".

1 papers0 benchmarksImages, Physics, Tabular

Spambase

Classifying Email as Spam or Non-Spam.

1 papers0 benchmarksTabular

PDFM Embeddings (Population Dynamics Foundation Model Embeddings)

PDFM Embeddings are condensed vector representations designed to encapsulate the complex, multidimensional interactions among human behaviors, environmental factors, and local contexts at specific locations. These embeddings capture patterns in aggregated data such as search trends, busyness trends, and environmental conditions (maps, air quality, temperature), providing a rich, location-specific snapshot of how populations engage with their surroundings. Aggregated over space and time, these embeddings ensure privacy while enabling nuanced spatial analysis and prediction for applications ranging from public health to socioeconomic modeling.

1 papers0 benchmarksTabular

Student's EEG Brain Signal

This dataset consists of EEG (Electroencephalogram) recordings collected from students at our college during an educational experiment. The objective of this dataset is to evaluate students' cognitive engagement and learning effectiveness while interacting with educational content.

1 papers0 benchmarksTabular

French Open Science Monitor Dataset

This dataset contains the publication data underlying the French Open Science Monitor.

1 papers0 benchmarksTabular

RFSD (Russian Financial Statements Database)

The Russian Financial Statements Database (RFSD) The Russian Financial Statements Database (RFSD) is an open, harmonized collection of annual unconsolidated financial statements of the universe of Russian firms.

1 papers0 benchmarksTabular

PreviousPage 11 of 14Next