Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

3,148 machine learning datasets

Filter by Modality

3,148 dataset results

FormulaNet

FormulaNet FormulaNet is a new large-scale Mathematical Formula Detection dataset. It consists of 46'672 pages of STEM documents from arXiv and has 13 types of labels. The dataset is split into a train set of 44'338 pages and a validation set of 2'334 pages. Due to copyrights reasons, we can only provide the list of papers, which must be downloaded and processed.

2 papers0 benchmarksImages, Texts

KPI-EDGAR

We introduce KPI-EDGAR, a novel dataset for Joint Named Entity Recognition and Relation Extraction building on financial reports uploaded to the Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system, where the main objective is to extract Key Performance Indicators (KPIs) from financial documents (the named entity recognition part) and link them to their numerical values (the relation extraction part).

2 papers2 benchmarksFinancial, Texts

Webis-Touché-2020

This paper is a condensed report on the second year of the Touché shared task on argument retrieval held at CLEF 2021. With the goal to provide a collaborative platform for researchers, we organized two tasks: (1) supporting individuals in finding arguments on controversial topics of social importance and (2) supporting individuals with arguments in personal everyday comparison situations.

2 papers0 benchmarksTexts

S-TEST

S-TEST is a benchmark for measuring the specificity of the language of pre-trained language models.

2 papers0 benchmarksTexts

ENTIGEN (Ethical NaTural Language Interventions in Text-to-Image GENeration)

ENTIGEN is a benchmark dataset to evaluate the change in image generations conditional on ethical interventions across three social axes -- gender, skin color, and culture. It contains 246 prompts based on an attribute set containing diverse professions, objects, and cultural scenarios.

2 papers0 benchmarksTexts

arXivEdits

arXivEdits an annotated corpus of 751 full papers from arXiv with gold sentence alignment across their multiple versions of revision, as well as fine-grained span-level edits and their underlying intentions for 1,000 sentence pairs. This dataset is designed for studying the human revision process in the scientific writing domain.

2 papers0 benchmarksTexts

CodeSyntax

CodeSyntax is a large-scale dataset of programs annotated with the syntactic relationships in their corresponding abstract syntax trees. It contains 18,701 code samples annotated with 1,342,050 relation edges in 43 relation types for Python, and 13,711 code samples annotated with 864,411 relation edges in 39 relation types for Java. It is designed to evaluate the performance of language models on code syntax understanding.

2 papers0 benchmarksTexts

HumSet

Timely and effective response to humanitarian crises requires quick and accurate analysis of large amounts of text data, a process that can highly benefit from expert-assisted NLP systems trained on validated and annotated data in the humanitarian response domain. To enable creation of such NLP systems, we introduce and release HumSet, a novel and rich multilingual dataset of humanitarian response documents annotated by experts in the humanitarian response community. The dataset provides documents in three languages (English, French, Spanish) and covers a variety of humanitarian crises from 2018 to 2021 across the globe. For each document, HumSet provides selected snippets (entries) as well as assigned classes to each entry annotated using common humanitarian information analysis frameworks. HumSet also provides novel and challenging entry extraction and multi-label entry classification tasks. In this paper, we take a first step towards approaching these tasks and conduct a set of expe

2 papers0 benchmarksTabular, Texts

CLSE (Corpus of Linguistically Significant Entities)

CLSE is an augmented version of the Schema-Guided Dialog Dataset. The corpus includes 34 languages and covers 74 different semantic types to support various applications from airline ticketing to video games.

2 papers0 benchmarksTexts

Paper2Fig100k

Paper2Fig100k is a dataset with over 100k images of figures and texts from research papers. The figures show architecture diagrams and methodologies of articles available at arXiv.org from fields like artificial intelligence and computer vision. Figures usually include text and discrete objects, e.g., boxes in a diagram, with lines and arrows that connect them.

2 papers0 benchmarksImages, Texts

Amharic - English Parallel Corpus for Machine Translation

Amharic - English Parallel Corpus for Machine Translation contains 33,955 sentence pairs extracted text from such news platforms as Ethiopian Press Agency1, Fana Broadcasting Corporate2, and Walta Information Center3. As the data we used is from different sources, it includes various domains such as religious (Bible and Quran), politics, economics, sports, news, among others.

2 papers0 benchmarksTexts

Concise

Concise has two datasets of 2000 sentences each, that were annotated by two and five human annotators, respectively. They are designed for the new task of making sentence concise.

2 papers0 benchmarksTexts

DIALOCONAN

DIALOCONAN is a dataset comprising over 3000 fictitious multi-turn dialogues between a hater and an NGO operator, covering 6 targets of hate.

2 papers0 benchmarksTexts

SciHTC

SciHTC is a dataset for hierarchical multi-label text classification (HMLTC) of scientific papers which contains 186,160 papers and 1,233 categories from the ACM CCS tree.

2 papers0 benchmarksTexts

ToM-in-AMC

ToM-in-AMC is a novel NLP benchmark, Short for Theory-of-Mind meta-learning Assessment with Movie Characters. The benchmark consists of 1,000 parsed movie scripts for this purpose, each corresponding to a few-shot character understanding task.

2 papers0 benchmarksTexts

FIB (Factual Inconsistency Benchmark)

Factual Inconsistency Benchmark (FIB) is a benchmark that focuses on the task of summarization. Specifically, the benchmark involves comparing the scores an LLM assigns to a factually consistent versus a factual inconsistent summary for an input news article. For factually consistent summaries, human-written reference summaries are used to manually verify as factually consistent.

2 papers0 benchmarksTexts

EmoPars

EmoPars is a dataset of 30,000 Persian Tweets labeled with Ekman’s six basic emotions (Anger, Fear, Happiness, Sadness, Hatred, and Wonder). This is the first publicly available emotion dataset in the Persian language.

2 papers0 benchmarksTexts

CSCD-IME

Chinese Spelling Correction Dataset for errors generated by pinyin IME (CSCD-IME), a dataset containing 40,000 annotated sentences from real posts of official media on Sina Weibo. It is designed to detect and correct spelling mistakes in Chinese texts.

2 papers0 benchmarksTexts

RoMQA

RoMQA is a benchmark for robust, multi-evidence, and multi-answer question answering (QA). RoMQA contains clusters of questions that are derived from related constraints mined from the Wikidata knowledge graph. The dataset evaluates robustness of QA models to varying constraints by measuring worst-case performance within each question cluster.

2 papers0 benchmarksTexts

LEAFTOP

Nouns extracted automatically from Bible translations across 1580 languages.

2 papers0 benchmarksTexts

PreviousPage 94 of 158Next