285 machine learning datasets
285 dataset results
This is the small version of the MuMiN dataset.
This is the medium version of the MuMiN dataset.
This is the large version of the MuMiN dataset.
A small dataset from the Inductive Link Prediction Challenge 2022. Training graph contains 10K entities, 96 relations, 78K triples. Inference graph contains 7K entities, 96 relations, 21K triples. Validation and test triples to predict belong to the inference graph.
A large dataset from the Inductive Link Prediction Challenge 2022. Training graph contains 46K entities, 130 relations, 202K triples. Inference graph contains 30K entities, 130 relations, 77K triples. Validation and test triples to predict belong to the inference graph.
The Room environment - v0
The OU-ISIR Gait Database, Multi-View Large Population Database with Pose Sequence (OUMVLP-Pose) is meant to aid research efforts in the general area of developing, testing and evaluating algorithms for model-based gait recognition.
We release 280 synthetic IAM graphs generated using IAM graphs of commercial companies. Specifically, we vary the number of nodes, but keep graph density as is, i.e. in the range of 0.259 ± 0.198 (avg ± std). To generate a synthetic graph, we first sample the number of users and datastores from uniform distributions over the following intervals [10, 150] and [50, 300] respectively that cover variations of those parameters across real graphs. After fixing node counts we sample with replacement the actual nodes from a real world graph, which is chosen at random. Then we add Gaussian N(0, 0.01) noise to node embeddings and renormalize them. To match the graph density with the density of the underlying baseline we sample edges from a multinomial distribution, where each component is proportional to the cosine distance between a user and a datastore embeddings. Also we enforce the invariant that dynamic edges are always a subset of all permission edges. A synthetic graph generated in such
Classifying all cells in an organ is a relevant and difficult problem from plant developmental biology. We here abstract the problem into a new benchmark for node classification in a geo-referenced graph. Solving it requires learning the spatial layout of the organ including symmetries. To allow the convenient testing of new geometrical learning methods, the benchmark of Arabidopsis thaliana ovules is made available as a PyTorch data loader, along with a large number of precomputed features.
Hypertention Disease Medication dataset.
Main Dataset city_pollution_data.csv
This is the list of all doges of the Venetian Republic, as well as their wives, if there's a record that they existed. They include name, surname if known, and date of their office, as well as the date of their weddings. Data has been extracted from the Wikipedia, with some errors fixed checking against other sources.
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
DPB-5L is a Multilingual KG dataset containing 5 KGs in English, French, Japanese, Greek, and Spanish. The dataset is used for the Knowledge Graph Completion and Entity Alignment task. DPB-5L (Spanish) is a subset of DPB-5L with Spanish KG.
pmuBAGE (the Benchmarking Assortment of Generated PMU Events) is a dataset that consists of almost 1000 instances of labeled event data to encourage benchmark evaluations on phasor measurement unit (PMU) data analytics. PMU data are challenging to obtain, especially those covering event periods. Nevertheless, power system problems have recently seen phenomenal advancements via data-driven machine learning solutions. A highly accessible standard benchmarking dataset would enable a drastic acceleration of the development of successful machine learning techniques in this field.
BeGin provides 23 benchmark scenarios for graph from 14 real-world datasets, which cover 12 combinations of the incremental settings and the levels of problem. In addition, BeGin provides various basic evaluation metrics for measuring the performances and final evalution metrics designed for continual learning.
The Room environment - v1
ZeroKBC is comprehensive benchmark that covers all scenarios of zero-shot Knowledge Base Completion (KBC) task. It has 3 zero-shot scenarios with 8 fine-grained settings.
Dataset of low fidelity resolutions of the RANS equations over airfoils.
This is the set of graphs used in the PACE 2022 challenge for computing the Directed Feedback Vertex Set, from the Heuristic track. It consists of 200 labelled directed graphs. The graphs are mostly not symmetric (an edge form u->v does not imply an edge from v->u), although some are symmetric. The graph labels are integers ranging from 1 to N.