52 machine learning datasets
52 dataset results
This repository contains the dataset for the study of the computational reproducibility of Jupyter notebooks from biomedical publications. We analyzed the reproducibility of Jupyter notebooks from GitHub repositories associated with publications indexed in the biomedical literature repository PubMed Central. The dataset includes the metadata information of the journals, publications, the Github repositories mentioned in the publications and the notebooks present in the Github repositories.
Item-wise accuracies in six benchmarks from Open LLM Leaderboard 1 scraped from huggingface.co and used for metabench analyses and construction. Datasets with RMSE's for random benchmark subsets are used as reference in the paper and are included here.
Dataset Card for SemTabNet This dataset accompanies the following paper:
We conducted a large crowdsourcing study of click patterns in an interactive segmentation scenario and collected 475K real-user clicks. Drawing on ideas from saliency tasks, we develop a clickability model that enables sampling clicks, which closely resemble actual user inputs. Using our model and dataset, we propose RClicks benchmark for a comprehensive comparison of existing interactive segmentation methods on realistic clicks. Specifically, we evaluate not only the average quality of methods, but also the robustness w.r.t. click patterns.
The Perfume Co-Preference Network dataset comprises comprehensive user reviews and ratings collected from the Persian retail platform Atrafshan. This dataset, central to our research on community detection in fragrance preferences, includes 36,434 comments from 7,387 unique users, providing insights into consumer sentiment towards various perfumes. It is designed to facilitate the analysis of user preferences through sentiment analysis, allowing for the clustering of perfumes based on shared attributes.
We introduce a dataset consisting of 1314 samples, including users’ tweets and bios. The user’s job title is found using Wikipedia crawling. The challenge of multiple job titles per user is handled using a semantic word embedding and clustering method. Then, a job prediction method is introduced based on a deep neural network and TF-IDF word embedding. We also use hashtags and emojis in the tweets for job prediction. Results show that the job title of users in Twitter could be well predicted with 54% accuracy in nine categories.
Abstract: Graph Neural Networks (GNNs) have recently gained traction in transportation, bioinformatics, language and image processing, but research on their application to supply chain management remains limited. Supply chains are inherently graph-like, making them ideal for GNN methodologies, which can optimize and solve complex problems. The barriers include a lack of proper conceptual foundations, familiarity with graph applications in SCM, and real-world benchmark datasets for GNN-based supply chain research. To address this, we discuss and connect supply chains with graph structures for effective GNN application, providing detailed formulations, examples, mathematical definitions, and task guidelines. Additionally, we present a multi-perspective real-world benchmark dataset from a leading FMCG company in Bangladesh, focusing on supply chain planning. We discuss various supply chain tasks using GNNs and benchmark several state-of-the-art models on homogeneous and heterogeneous grap
This dataset collection includes three files used for the experiments. Each file contains 6 columns: {timestep, vehicle ID, x coordinate in the map, y coordinate in the map, real bitrate, estimated bitrate}. The datasets, obtained from REMs with Gaussian estimation and real (https://ieee-dataport.org/open-access/crawdad-romataxi) or simulated (https://eclipse.dev/sumo/) vehicular mobility, are used in the original paper for optimizing the task of federated learning (client scheduling and resource allocation).
A well-labeled challenging dataset, to facilitate the research on style recognition on anime images by collecting images from 190 anime and cartoon works covering 93 years from 13 countries and regions, 2D and 3D work into consideration concurrently. We choose at most ten roles for each work. All the images are obtained from the Internet. The images in the LSASRD dataset are mainly from existing anime and cartoons. Moreover, some are from comics or games of the same anime series. Unlike illustration or video datasets, we provide a moderate amount of contextual information in a wide variety of styles. LSASRD requires the ability of context understanding of image models.
This dataset contains pre-processed versions of datasets introduced in prior works. Additionally, it also contains new data that are pertinent to the paper.
IOPS and Latency measurements of a real data storage system
How to contact Expedia by phone? You can contact Expedia by phone at +1-805-330-4056 if you're in Mexico, or +1-888-829-0881 if you're in the United States. Both numbers are available 24 hours a day and offer support in Spanish to help you with any travel-related questions.