TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

3,148 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

3,148 dataset results

PMOA-CITE

The dataset used in the experiments on the paper "Modeling citation worthiness by using attention‑based bidirectional long short‑term memory networks and interpretable models"

2 papers0 benchmarksTexts

PanCancer Multimodal (HoneyBee)

Dataset Card for The Cancer Genome Atlas (TCGA) Multimodal Dataset <!-- Provide a quick summary of the dataset. -->

2 papers0 benchmarksImages, Medical, Tabular, Texts

mango

Large language models such as ChatGPT and GPT-4 have recently achieved astonishing performance on a variety of natural language processing tasks. In this paper, we propose MANGO, a benchmark to evaluate their ability to perform text-based mapping and navigation. Our benchmark includes $53$ mazes taken from a suite of textgames: each maze is paired with a walkthrough that visits every location but does not cover all possible paths. The task is question-answering: for each maze, a large language model reads the walkthrough and answers hundreds of mapping and navigation questions such as "How should you go to Attic from West of House?" and "Where are we if we go north and east from Cellar?". Although these questions are easy for humans, it turns out that even GPT-4, the best-to-date language model, performs poorly when answering them. Further, our experiments suggest that a strong mapping and navigation ability would benefit the performance of large language models on relevant downstre

2 papers0 benchmarksTexts

Human Simulacra

Human Simulacra is a virtual character dataset that contains 129k texts across 11 virtual characters, with each character having unique attributes, biographies, and stories.

2 papers0 benchmarksTexts

ClimateIQA

The dataset was created to address the crucial need for effective Extreme Weather Events Detection (EWED), an increasingly urgent task due to the rising frequency of such events driven by global warming. Traditional methods for EWED rely on numerical threshold setting and the analysis of weather anomaly heatmaps, visualizing data such as temperature, wind speed, and precipitation. However, these methods often involve manual work and can be time-consuming and error-prone. While advances in AI have led to the development of machine learning models like Convolutional Neural Networks (CNNs) for weather prediction and EWED, these models predominantly use numeric data and often yield low accuracy. Moreover, despite the proficiency of Large Language Models (LLMs) in generating textual weather reports, they struggle with interpreting visual data—crucial for EWED. General Vision-Language Models (VLMs) also face challenges in accurately interpreting meteorological heatmaps, commonly misidentifyi

2 papers0 benchmarksEnvironment, Images, Texts

LUMA (Learning from Uncertain and Multimodal Data)

LUMA is a multimodal dataset that consists of audio, image, and text modalities. It allows controlled injection of uncertainties into the data and is mainly intended for studying uncertainty quantification in multimodal classification settings. This repository provides the Audio and Text modalities. The image modality consists of images from CIFAR-10/100 datasets. To download the image modality and compile the dataset with a specified amount of uncertainties, please use the LUMA compilation tool.

2 papers0 benchmarksAudio, Images, Texts

DTGB (Dynamic Text-attributed Graph Benchmark)

We introduce Dynamic Text-attributed Graph Benchmark (DTGB), a collection of large-scale, time-evolving graphs from diverse domains, with nodes and edges enriched by dynamically changing text attributes and categories. To facilitate the use of DTGB, we design standardized evaluation procedures based on four real-world use cases: future link prediction, destination node retrieval, edge classification, and textual relation generation. These tasks require models to understand both dynamic graph structures and natural language, highlighting the unique challenges posed by DyTAGs.

2 papers0 benchmarksGraphs, Texts, Time series

MMSD2.0 (Towards a Reliable Multi-modal Sarcasm Detection System)

Multi-modal sarcasm detection has attracted much recent attention. Nevertheless, the existing benchmark (MMSD) has some shortcomings that hinder the development of reliable multi-modal sarcasm detection system:(1) There are some spurious cues in MMSD, leading to the model bias learning; (2) The negative samples in MMSD are not always reasonable.To solve the aforementioned issues, we introduce MMSD2.0, a correction dataset that fixes the shortcomings of MMSD, by removing the spurious cues and re-annotating the unreasonable samples.Meanwhile, we present a novel framework called multi-view CLIP that is capable of leveraging multi-grained cues from multiple perspectives (i.e., text, image, and text-image interaction view) for multi-modal sarcasm detection.Extensive experiments show that MMSD2.0 is a valuable benchmark for building reliable multi-modal sarcasm detection systems and multi-view CLIP can significantly outperform the previous best baselines (with a 5.6% improvement).

2 papers0 benchmarksImages, Texts

QuRe

Introduction Generalized quantifiers (e.g., few, most) are used to indicate the proportions predicates are satisfied. QuRe is quantifier reasoning dataset from Pragmatic Reasoning Unlocks Quantifier Semantics for Foundation Models. It includes real-world sentences from Wikipedia and human annotations of generalized quantifiers from English speakers.

2 papers0 benchmarksTexts

E.T. the Exceptional Trajectories

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

2 papers6 benchmarks3D, 3d meshes, Texts, Videos

WDC-PAVE (Web Data Commones - Product Attribute Value Extraction)

The datasets contains 1,420 human annotated product offers, systematically selected from the Web Data Commons Product Matching Corpus, featuring 24,582 annotated attribute-value pairs, making it a valuable resource for both product attribute-value extraction and product matching tasks. The normalized gold standard contains the standardized attribute value pairs as described below.

2 papers2 benchmarksTexts

DART-Math-Hard

🎯 DART-Math

2 papers0 benchmarksTexts

FuLG

FuLG is a comprehensive Romanian language corpus comprising 150 billion tokens, carefully extracted from Common Crawl. This extensive dataset is the result of rigorous filtering and deduplication processes applied to 95 Common Crawl snapshots. The compressed dataset has 289 GB.

2 papers0 benchmarksTexts

NoW (Noise of Web)

Noise of Web (NoW) is a challenging noisy correspondence learning (NCL) benchmark for robust image-text matching/retrieval models. It contains 100K image-text pairs consisting of website pages and multilingual website meta-descriptions (98,000 pairs for training, 1,000 for validation, and 1,000 for testing). NoW has two main characteristics: without human annotations and the noisy pairs are naturally captured. The source image data of NoW is obtained by taking screenshots when accessing web pages on mobile user interface (MUI) with 720 $\times$ 1280 resolution, and we parse the meta-description field in the HTML source code as the captions. In NCR (predecessor of NCL), each image in all datasets were preprocessed using Faster-RCNN detector provided by Bottom-up Attention Model to generate 36 region proposals, and each proposal was encoded as a 2048-dimensional feature. Thus, following NCR, we release our the features instead of raw images for fair comparison. However, we can not just

2 papers0 benchmarksImages, Texts

ParaMAWPS (Paraphrased Math Word Problem Solving Repository)

This repository contains the code, data, and models of the paper titled "Math Word Problem Solving by Generating Linguistic Variants of Problem Statements" published in the Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop).

2 papers4 benchmarksTexts

SpCQL (Text-to-CQL)

The first dataset contains annotated natural language queries (i.e. Mandarin) with their Cypher equivalent. It is made up of: - A Neo4j database - 10000 pairs of Text-Cypher queries

2 papers0 benchmarksTexts

RMCBench

The first benchmark comprising 473 prompts designed to assess the ability of LLMs to resist malicious code generation.

2 papers0 benchmarksTexts

UK Biobank Brain MRI (UK Biobank Data - Brain MRI)

UK Biobank participants have generously provided a very wide range of information about their health and well-being since recruitment began in 2006. This has been added to in the following ways: 

2 papers2 benchmarks3D, Images, Texts, Time series

MECD (Multi-Event Causal Discovery)

Provide:

2 papers4 benchmarksTexts, Videos

TruthQuest

A benchmark for suppositional reasoning based on the principles of knights and knaves puzzles. Knights and knaves problems represent a classic genre of logical puzzles where characters either tell the truth or lie. The objective is to logically deduce each character's identity based on their statements. The challenge arises from the truth-telling or lying behavior, which influences the logical implications of each statement.

2 papers0 benchmarksTexts
PreviousPage 99 of 158Next