Datasets

285 machine learning datasets

285 dataset results

Modified Swiss Dwellings

The Modified Swiss Dwellings (MSD) dataset is an ML-ready dataset for floor plan generation and analysis at building-level scale. The MSD dataset is completely derived from the Swiss Dwellings database (v3.0.0). The MSD dataset contains highly-detailed 5372 floor plans of single- as well as multi-unit building complexes across Switzerland, hence extending the building scale w.r.t. of other well know floor plan datasets like the RPLAN dataset.

1 papers0 benchmarksGraphs, Images

RoomEnv-v2 (The Room environment - v2)

The Room environment - v2

1 papers1 benchmarksGraphs, Texts

WikiOFGraph (Wikipedia Ontology-Free Graph-Text)

a high-level explanation of the dataset characteristics We introduce WikiOFGraph, a novel large-scale, domain-diverse dataset synthesized by LLMs, ensuring superior graph-text consistency to advance general-domain graph-to-text generation.

1 papers2 benchmarksGraphs, Texts

Perfume Co-Preference Network

The Perfume Co-Preference Network dataset comprises comprehensive user reviews and ratings collected from the Persian retail platform Atrafshan. This dataset, central to our research on community detection in fragrance preferences, includes 36,434 comments from 7,387 unique users, providing insights into consumer sentiment towards various perfumes. It is designed to facilitate the analysis of user preferences through sentiment analysis, allowing for the clustering of perfumes based on shared attributes.

1 papers0 benchmarksGraphs, Tables, Texts

SCG (SCG Dataset from Graph Neural Networks in Supply Chain Analytics and Optimization: Concepts, Perspectives, Dataset & Benchmarks)

Abstract: Graph Neural Networks (GNNs) have recently gained traction in transportation, bioinformatics, language and image processing, but research on their application to supply chain management remains limited. Supply chains are inherently graph-like, making them ideal for GNN methodologies, which can optimize and solve complex problems. The barriers include a lack of proper conceptual foundations, familiarity with graph applications in SCM, and real-world benchmark datasets for GNN-based supply chain research. To address this, we discuss and connect supply chains with graph structures for effective GNN application, providing detailed formulations, examples, mathematical definitions, and task guidelines. Additionally, we present a multi-perspective real-world benchmark dataset from a leading FMCG company in Bangladesh, focusing on supply chain planning. We discuss various supply chain tasks using GNNs and benchmark several state-of-the-art models on homogeneous and heterogeneous grap

1 papers1 benchmarksGraphs, Tables

Equilibrium-Traffic-Networks

This repository contains three graph datasets for the UE traffic assignment problem on Sioux-Falls, Eastern-Massachusetts and Anaheim networks in both dgl and pyg formats. The datasets are generated and used to train and evaluate models for solving the User Equilibrium (UE) problem on three transportation networks:

1 papers0 benchmarksGraphs

Financial Dynamic Knowledge Graph

FinDKG: The Global Financial Dynamic Knowledge Graph Dataset FinDKG is an open-source dataset focused on creating a temporally-resolved Financial Dynamic Knowledge Graph. Designed to bridge the gap in industry-specific knowledge graphs, particularly in the financial sector, FinDKG provides a high-touch, temporally-aware representation of global economic and market dynamics. This repository includes comprehensive details about the dataset, methodology, and schema, aiming to facilitate academic research and actionable insights in global financial markets.

1 papers0 benchmarksFinancial, Graphs, Texts

Data for: "Linking Datasets on Organizations Using Half a Billion Open-Collaborated Records"

Source: Linking Datasets on Organizations Using Half-a-Billion Open-Collaborated Records (Description (Markdown and LATEX enabled))

1 papers0 benchmarksGraphs, Texts

AnimArchCatalogue01

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 papers0 benchmarksGraphs

CompMix-IR

CompMix-IR Dataset Overview:

1 papers0 benchmarksGraphs, Tabular, Texts

Multi Lingual Bug Reports

Dataset Description The dataset used in this study comprises bug reports extracted from the Visual Studio Code GitHub repository, specifically focusing on those labeled with the english-please tag. This label indicates that the original submission was written in a language other than English, providing a clear signal for multilingual content. The dataset spans a five-year period (March 2019--June 2024), ensuring a diverse representation of bug types, user environments, and technical contexts.

1 papers1 benchmarksGraphs, Images, Texts

Mechanical Metamaterial: Square Array of Circular Holes Under Deformation

This repository contains data for a research project involving graph neural networks (GNNs) applied to mechanical metamaterials and their deformations.

1 papers0 benchmarksGraphs

GeoJEPAD (GeoJEPA Dataset)

GeoJEPAD is a multimodal dataset combining OpenStreetMap (OSM) data (attributes and geometries) with high-resolution aerial imagery from diverse urban areas.   Sourced from NAIP and OSM and then processed, tiled, and cropped. Geometries and relations represented as graphs with optional visibility edges.

1 papers0 benchmarksGraphs, Images, Texts

BTS (Building Timeseries Dataset: Empowering Large-Scale Building Analytics)

The Building TimeSeries (BTS) dataset covers three buildings over a three-year period, comprising more than ten thousand timeseries data points with hundreds of unique ontologies. Moreover, the metadata is standardised in the formed of knowledge graph using the Brick schema.

1 papers0 benchmarksGraphs, Time series

Shaved Ice Snowflake VM Demand Dataset (Snowflake Dataset for "Shaved Ice: Optimal Compute Resource Commitments for Dynamic Multi-Cloud Workloads" paper)

This repository contains documentation for the dataset that accompanies our ICPE 2025 paper, "Shaved Ice: Optimal Compute Resource Commitments for Dynamic Multi-Cloud Workloads". It also includes example R and Python notebooks to read and visualize the data, including scripts to reproduce the figures and analysis results in the paper.

1 papers0 benchmarksGraphs, Images, Time series

ATC-GRAPH

ATC-GRAPH is the most extensive ATC benchmark dataset. All drugs in the benchmarks are linked to their Mol files instead of the SMILES sequences utilized in earlier benchmarks. This shift allows for more precise and detailed modeling and learning. In terms of scale, ATC-GRAPH surpasses Chen-2012 and ATC-SMILES by 36.78% and 16.85%, respectively. Significantly, ATC-GRAPH was curated through a cross-validation process involving multiple resources such as KEGG, PubChem, ChEMBL, ChemSpider, and ChemicalBook. This results in ATC-GRAPH being distinguished by its timeliness and comprehensive coverage across all five levels and drug genres.

1 papers5 benchmarksBiomedical, Graphs

PEnG (Pose-Enhanced Geo-Localisation)

This dataset builds upon the SpaGBOL dataset - a graph-based dataset covering numerous cities across the globe for the purpose of structured city-scale Cross-View Geo-Localisation (CVGL).

1 papers0 benchmarksGraphs, Images

B-XAIC

B-XAIC consists of 50K small molecules represented as graphs and includes 7 graph classification tasks, each with ground truth labels and corresponding explanations.

1 papers0 benchmarksGraphs

7-digit Product-level Supply-Use and Input-Output Tables Using ASI Data

This paper constructs 7-digit product Supply-Use Tables (SUTs) and symmetric Input-Output Tables (IOTs) for the Indian economy using microdata from the Annual Survey of Industries (ASI) for the period 2016-2021. We outline the methodology for generating input flows and reconciling registered and unregistered sector data via NPCMS-NIC concordance. The transition from SUTs to IOTs is explained using the Industry Technology Assumption. We apply this framework to analyse the economic impact—specifically Domestic Value Added (DVA) and employment influenced by production and exports. A case study of India's mobile phone sector reveals significant output growth, import substitution, an increase in exports, a shift in DVA/FVA shares, notable employment growth, with a leaning towards contractual labour, and increased female participation. These tables are valuable for analysing sectoral interdependencies and industrial policy effectiveness in India.

1 papers0 benchmarksGraphs, Images, Tabular, Texts, Time series

DBP-5L (Japanese)

DPB-5L is a Multilingual KG dataset containing 5 KGs in English, French, Japanese, Greek, and Spanish. The dataset is used for the Knowledge Graph Completion and Entity Alignment task. DPB-5L (Japanese) is a subset of DPB-5L with Japanese KG.

0 papers0 benchmarksGraphs

PreviousPage 14 of 15Next