TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

285 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

285 dataset results

dockstring

Regression dataset for molecular docking scores (predicted molecule-protein binding affinity). Contains ~250,000 molecules against 58 protein targets.

18 papers0 benchmarksGraphs

Elliptic Dataset

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

18 papers4 benchmarksGraphs

MalNet

MalNet is a large public graph database, representing a large-scale ontology of software function call graphs. MalNet contains over 1.2 million graphs, averaging over 17k nodes and 39k edges per graph, across a hierarchy of 47 types and 696 families.

17 papers1 benchmarksGraphs

Chameleon(60%/20%/20% random splits)

Node classification on Chameleon with 60%/20%/20% random splits for training/validation/test.

17 papers2 benchmarksGraphs

SketchGraphs

SketchGraphs is a dataset of 15 million sketches extracted from real-world CAD models intended to facilitate research in both ML-aided design and geometric program induction. Each sketch is represented as a geometric constraint graph where edges denote designer-imposed geometric relationships between primitives, the nodes of the graph.

16 papers0 benchmarksGraphs, Images

Wiki-One

This dataset is a Wikipedia dump, split by relations to perform Few-Shot Knowledge Graph Completion.

16 papers0 benchmarksGraphs

Cornell (60%/20%/20% random splits)

Node classification on Cornell with 60%/20%/20% random splits for training/validation/test.

16 papers1 benchmarksGraphs

Texas(60%/20%/20% random splits)

Node classification on Texas with 60%/20%/20% random splits for training/validation/test.

16 papers1 benchmarksGraphs

Cornell (48%/32%/20% fixed splits)

Node classification on Cornell with the fixed 48%/32%/20% splits provided by Geom-GCN.

16 papers2 benchmarksGraphs

BeerAdvocate

BeerAdvocate is a dataset that consists of beer reviews from beeradvocate. The data span a period of more than 10 years, including all ~1.5 million reviews up to November 2011. Each review includes ratings in terms of five "aspects": appearance, aroma, palate, taste, and overall impression. Reviews include product and user information, followed by each of these five ratings, and a plaintext review.

15 papers2 benchmarksGraphs

Wisconsin (48%/32%/20% fixed splits)

Node classification on Wisconsin with the fixed 48%/32%/20% splits provided by Geom-GCN.

15 papers2 benchmarksGraphs

Cora (48%/32%/20% fixed splits)

Node classification on Cora with the fixed 48%/32%/20% splits provided by Geom-GCN.

15 papers2 benchmarksGraphs

Citeseer (48%/32%/20% fixed splits)

Node classification on Citeseer with the fixed 48%/32%/20% splits provided by Geom-GCN.

15 papers2 benchmarksGraphs

PubMed (48%/32%/20% fixed splits)

Node classification on PubMed with the fixed 48%/32%/20% splits provided by Geom-GCN.

15 papers2 benchmarksGraphs

Linux (Linux Program Dependence Graphs)

The LINUX dataset consists of 48,747 Program Dependence Graphs (PDG) generated from the Linux kernel. Each graph represents a function, where a node represents one statement and an edge represents the dependency between the two statements

14 papers0 benchmarksGraphs

Texas (48%/32%/20% fixed splits)

Node classification on Texas with the fixed 48%/32%/20% splits provided by Geom-GCN.

14 papers2 benchmarksGraphs

Film(48%/32%/20% fixed splits)

Node classification on Film with the fixed 48%/32%/20% splits provided by Geom-GCN.

14 papers2 benchmarksGraphs

Mutagenicity

Mutagenicity is a chemical compound dataset of drugs, which can be categorized into two classes: mutagen and non-mutagen.

13 papers2 benchmarksGraphs

MovieGraphs

Provides detailed, graph-based annotations of social situations depicted in movie clips. Each graph consists of several types of nodes, to capture who is present in the clip, their emotional and physical attributes, their relationships (i.e., parent/child), and the interactions between them. Most interactions are associated with topics that provide additional details, and reasons that give motivations for actions.

13 papers0 benchmarksGraphs

UPFD (User Preference-aware Fake News Detection)

For benchmarking, please refer to its variant UPFD-POL and UPFD-GOS.

13 papers0 benchmarksGraphs, Texts
PreviousPage 5 of 15Next