TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

ARC (The Abstraction and Reasoning Corpus)

The Abstraction and Reasoning Corpus (ARC) is a dataset created by François Chollet in 2019. It’s designed to measure the gap between machine and human learning. The dataset consists of 1000 image-based reasoning tasks. Each task provides an input image and asks for an output image. The goal is to solve these tasks using a system that can understand and learn abstract concepts, and apply reasoning skills to generate the correct output. This dataset poses a significant challenge for AI systems and is used to advance research in artificial intelligence and machine learning.

2 papers0 benchmarks

Shortcut QA

Recent applications of LLMs in Machine Reading Comprehension (MRC) systems have shown impressive results, but the use of shortcuts, mechanisms triggered by features spuriously correlated to the true label, has emerged as a potential threat to their reliability. We analyze the problem from two angles: LLMs as editors, guided to edit text to mislead LLMs; and LLMs as readers, who answer questions based on the edited text. We introduce a framework that guides an editor to add potential shortcuts-triggers to samples. Using GPT4 as the editor, we find it can successfully edit trigger shortcut in samples that fool LLMs. Analysing LLMs as readers, we observe that even capable LLMs can be deceived using shortcut knowledge. Strikingly, we discover that GPT4 can be deceived by its own edits (15% drop in F1). Our findings highlight inherent vulnerabilities of LLMs to shortcut manipulations. We publish ShortcutQA, a curated dataset generated by our framework for future research.

2 papers0 benchmarks

GQA-OOD

GQA-OOD is a new dataset and benchmark for the evaluation of VQA models in OOD (out of distribution) settings.

2 papers0 benchmarksImages, Texts

KGRC-RDF-star

KGRC-RDF-star is an RDF-star dataset converted from KGRC-RDF, which is a Knowledge graph dataset of novel stories.

2 papers0 benchmarksGraphs

ARAS (Action with RAre Scene)

Action with RAre Scene is a small scale dataset collected from Youtube. By definition, it includes video clips of human actions (those action categories fall into Kinetics-400 action classes) with rare scenes or backgrounds.

2 papers0 benchmarks

MegaNegRaising

The MegaNegRaising dataset, also known as MegaNeRd, is a collection of data that captures patterns of neg-raising inferences and acceptability judgments for 925 clause-embedding verbs of English in various syntactic structures. It is part of a larger project that investigates lexically triggered inferences across clause-embedding verbs in English.

2 papers0 benchmarks

Social Support

Online social platforms serve a critical role for individuals as they seek to fill informational and emotional needs, from informational support like advice to emotional support like expressions of sympathy, frequently by interacting with others. The supportive replies of others help promote personal well-being, yet unsupportive replies can not only lead to distress but discourage online engagement altogether. In this work, we aim to study support in general - everyday interactions - drawing upon theories of how support is expressed in language. Our work is motivated by an agenda of promoting supportive online platforms where people can participate equally.

2 papers0 benchmarks

SweRec

The SweRec dataset in ScandEval is a Swedish language dataset used for text classification tasks. It contains strings of text, each associated with a label indicating the sentiment of the text. The labels are "positive", "negative", or "neutral", representing the sentiment expressed in the text.

2 papers0 benchmarks

Stack Exchange

The Stack Exchange dataset is a collection of data from various Stack Exchange sites, including Stack Overflow, Mathematics, Super User, and many others. It includes questions, answers, comments, tags, and other related data from these sites.

2 papers0 benchmarks

LanguageNet

The LanguageNet (English) is a collection of sentence level paraphrases from Twitter by linking tweets through shared URLs. This corpus is the largest up to date with 51,524 human annotated sentence pairs: 42200 for training and 9324 for testing. It can grow 30,000 new sentential paraphrases per month with ~70% precision. Now we have 1-year data available: 2,869,657 candidate pairs!

2 papers0 benchmarks

PPC (Polish Paraphrase Corpus)

The Polish Paraphrase Corpus (PPC) is a dataset consisting of 7000 manually labeled sentence pairs in Polish. The purpose of creating this dataset was to verify how machine learning models perform in the challenging problem of paraphrase identification, where most records contain semantically overlapping parts. The dataset was divided into training, validation, and test splits, and each record was assigned to one of three categories: exact paraphrases, close paraphrases, or non-paraphrases. The corpus was created by automatically generating candidate pairs and then manually labeling them. The extracted sentence pairs were drawn from different data sources, including Taboeba, Polish news articles, Wikipedia, and the Polish version of the SICK dataset.

2 papers0 benchmarks

TTC (Tatoeba Translation Challenge)

This is a challenge set for machine translation that contains 32G translation units in 2,539 bitexts. The whole data set covers 487 languages linked to each other in 4,024 language pairs. The package includes a release of 657 test sets derived from Tatoeba.org that cover 138 languages. Training data is compiled from various sources collected within the OPUS project.

2 papers0 benchmarks

Visual Near-Duplicates Detection in the Context of Social Media

The dataset of the paper: ``Dataset and Case Studies for Visual Near-Duplicates Detection in the Context of Social Media'', by Hana Matatov, Mor Naaman, and Ofra Amir.

2 papers0 benchmarks

EmoTag1200

The EmoTag1200 dataset is a collection of resources for analyzing the emotion and sentiment of emojis as well as tweets written in English. The name EmoTag indicates its usefulness in exploiting emojis for emotional tagging.

2 papers0 benchmarks

VIST-Character

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

2 papers0 benchmarks

Avocado research email collection

Avocado Research Email Collection consists of emails and attachments taken from 279 accounts of a defunct information technology company referred to as "Avocado". Most of the accounts are those of Avocado employees; the remainder represent shared accounts such as "Leads", or system accounts such as "Conference Room Upper Canada".

2 papers0 benchmarks

Wastewater catchment areas in Great Britain

Wastewater catchment area data are essential for wastewater treatment capacity planning and have recently become critical for operationalising wastewater-based epidemiology (WBE) for COVID-19. Owing to the privatised nature of the water industry in the United Kingdom, the required catchment area datasets are not readily available to researchers. Here, we present a consolidated dataset of 7,537 catchment areas from ten sewerage service providers in the Great Britain, covering more than 96% of the population of England and Wales.

2 papers0 benchmarksEnvironment

InSpaceType (Indoor Space Type Dataset for Monocular Depth Analysis)

High Quality Indoor Monocular Depth Estimation Dataset with focus on performance variation across space type

2 papers0 benchmarks3D, Images, RGB-D

Forward-Looking Sonar Marine Debris Datasets

This dataset is made up of forward-looking sonar images containing ten classes of underwater debris. The dataset can be used for segmentation or object detection. Applications include training computer vision models for underwater robotics applications.

2 papers2 benchmarksImages

WyzeRule

Wyze Rule Recommendation Dataset. It is a big dataset with 300,000 users. Please cite [1] if you used the dataset and cite [2] if you referenced the algorithm.

2 papers0 benchmarksGraphs, Tabular
PreviousPage 344 of 1000Next