TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

285 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

285 dataset results

AutoFR Dataset

AutoFR Dataset is broken down by each site that we crawl within a zip file. It contains multiple different experiments that we conducted in our paper. The overall dataset contains 1042 sites that we crawled where we detected ads within the Top-5K.

1 papers0 benchmarksGraphs

Simulated wind farm graph dataset (floris-wind-farm-dataset)

FLORIS farm dataset A dataset for graph neural network modeling of wind farms. The current version of the dataset contains two farms, with very different geometry but similar inter-turbine statistics. The wind farms were simulated with the steady-state wake model FLORIS.

1 papers0 benchmarksGraphs

data_qe (Federal Reserve Quantitative Easing Data)

This file contains the data and code for the publication "The Federal Reserve's Response to the Global Financial Crisis and Its Long-Term Impact: An Interrupted Time-Series Natural Experimental Analysis" by A. C. Kamkoum, 2023.

1 papers0 benchmarksGraphs, Tables, Time series

Myket Android Application Install

This dataset contains information on application install interactions of users in the Myket android application market. The dataset was created for the purpose of evaluating interaction prediction models, requiring user and item identifiers along with timestamps of the interactions. Hence, the dataset can be used for interaction prediction and building a recommendation system. Furthermore, the data forms a dynamic network of interactions, and we can also perform network representation learning on the nodes in the network, which are users and applications.

1 papers0 benchmarksGraphs, Time series

USPTO-30K

We introduce USPTO-30K, a large-scale benchmark dataset of annotated molecule images, which overcomes these limitations. It is created using the pairs of images and MolFiles by the United States Patent and Trademark Office. Each molecule was independently selected among all the available documents from 2001 to 2020. The set consists of three subsets to decouple the study of clean molecules, molecules with abbreviations and large molecules.

1 papers0 benchmarksGraphs, Images

MolGrapher-Synthetic-300K

The set is created using molecule SMILES retrieved from the database PubChem. Images are then generated from SMILES using the molecule drawing library RDKit. The synthetic set is augmented at multiple levels:

1 papers0 benchmarksGraphs, Images

Graph dataset MOLT-4 (MOLT-4)

Dataset introduced by Xifeng Yan et al.

1 papers0 benchmarksGraphs

Graph dataset MCF-7 (MCF-7)

Dataset introduced by Xifeng Yan et al.

1 papers0 benchmarksGraphs

IMCPT-SparseGM-50

IMCPT-SparseGM dataset is a new visual graph matching benchmark addressing partial matching and graphs with larger sizes, based on the novel stereo benchmark Image Matching Challenge PhotoTourism (IMC-PT) 2020. This dataset is released in CVPR 2023 paper Deep Learning of Partial Graph Matching via Differentiable Top-K.

1 papers1 benchmarksGraphs

IMCPT-SparseGM-100

IMCPT-SparseGM dataset is a new visual graph matching benchmark addressing partial matching and graphs with larger sizes, based on the novel stereo benchmark Image Matching Challenge PhotoTourism (IMC-PT) 2020. This dataset is released in CVPR 2023 paper Deep Learning of Partial Graph Matching via Differentiable Top-K.

1 papers1 benchmarksGraphs

CIRO experimental results

Description This repository includes the experiment results, source code, and test data for Three Cs risk inference, using the CIRO (COVID-19 Infection Risk Ontology) and HermiT.

1 papers0 benchmarksGraphs

GEval for KGRC-RDF-star

This repository is an extension of GEval. This repository contains a (software) evaluation framework to perform evaluation and comparison on RDF-star graph embedding techniques. The gold standard datasets for evaluation were created from KGRC-RDF-star. Please see here.

1 papers0 benchmarksGraphs

HatefulDiscussions

Multi-Modal Hate Speech Detection with Graph Context.

1 papers0 benchmarksGraphs, Images, Texts

Genre2Movies (Compositional queries for Movie recommendation)

Genre annotations for movies The file genre2movies.csv contains genre-movie tuples based on Wikidata annotations (https://www.wikidata.org/).

1 papers0 benchmarksGraphs, Ranking, Tabular

Synthetic Dynamic Networks (from Aging, Fitness Preferential Attachment mechanisms)

This dataset accompanies the paper `Learning the mechanisms of network growth' by the same authors. The dataset contains 6733 networks of size 20,000 each generated in accordance to different combination of three mechanisms: fitness, aging and preferential attachment. The goal is to use machine learning to identify the combination of mechanisms that was used to create the network. The dataset includes static features from the literature and two version of our newly developed dynamic features. net

1 papers2 benchmarksGraphs

Twitter-HyDrug-UR (Twitter Hypergraph Drug for User Roles)

This benchmark hypergraph dataset, Twitter-HyDrug-UR, is derived from Twitter-HyDrug by HyGCL-DC. Twitter-HyDrug-UR is a real-world hypergraph data that describes the drug trafficking on Twitter. Unlike HyGCL-DC, which targets a drug trafficking community detection task (a multi-label node classification), we aim to identify drug user roles in drug trafficking activities on social media. To this end, we categorize node labels into four distinct roles: drug seller, drug buyer, drug user, and drug discussant, and each node is assigned to one and only one label. Consequently, we frame the problem for Twitter-HyDrug-UR as a multi-class node classification task.

1 papers1 benchmarksGraphs

Healthcare Provider Fraud Detection Analysis

Inpatient claims, Outpatient claims and Beneficiary details of each provider.

1 papers4 benchmarksGraphs, Tabular

dafont

Download free fonts in DaFont style from our extensive collection. Find bold, italic, cursive, futuristic fonts, and more. Enhance your projects with unique and stylish typography today!

1 papers0 benchmarksGraphs, Texts

MAPLE

The MAPLE benchmark constructed by us contains 20 datasets across 19 fields for scientific literature tagging. It also has a graph format, which can be used for graph mining tasks (e.g., node classification, link prediction). Refer to its homepage for more details.

1 papers0 benchmarksGraphs

HALvest-Geometric

HALvest-Geometric is a subset of HALvest: an academic citation network with 238,397 disambiguated authors and 18,662,037 scholarly papers.

1 papers0 benchmarksGraphs, Texts
PreviousPage 13 of 15Next