Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Browse State-of-the-Art

40,176 benchmarks across 2,101 tasks

All Methodology Computer Vision Natural Language Processing Medical Miscellaneous Time Series Graphs Robots Knowledge Base Adversarial Audio Speech Playing Games Reasoning Computer Code Music

Natural Language Processing

Translation

6 benchmarks

12395 papers

Question Answering

406 benchmarks

10817 papers

Machine Translation

126 benchmarks

10752 papers

Sentiment Analysis

170 benchmarks

5630 papers

Information Retrieval

34 benchmarks

4740 papers

Knowledge Distillation

18 benchmarks

4240 papers

Text Classification

261 benchmarks

3635 papers

Named Entity Recognition (NER)

161 benchmarks

2874 papers

Binary Classification

8 benchmarks

2574 papers

Semantic Textual Similarity

40 benchmarks

2381 papers

Visual Question Answering

61 benchmarks

2177 papers

Visual Question Answering (VQA)

230 benchmarks

2167 papers

Specificity

0 benchmarks

2094 papers

Natural Language Understanding

29 benchmarks

1978 papers

Relation Extraction

156 benchmarks

1977 papers

Natural Language Inference

65 benchmarks

1961 papers

Image Captioning

177 benchmarks

1878 papers

NMT

0 benchmarks

1773 papers

Reading Comprehension

28 benchmarks

1760 papers

Code Generation

71 benchmarks

1697 papers

Video Generation

45 benchmarks

1466 papers

Dependency Parsing

33 benchmarks

1407 papers

Prompt Engineering

16 benchmarks

1236 papers

Semantic Parsing

70 benchmarks

1202 papers

Instruction Following

5 benchmarks

1135 papers

Memorization

1 benchmarks

1088 papers

Text-to-Image Generation

55 benchmarks

1085 papers

Word Sense Disambiguation

41 benchmarks

1035 papers

Part-Of-Speech Tagging

15 benchmarks

990 papers

Text to Image Generation

1 benchmarks

969 papers

Common Sense Reasoning

37 benchmarks

939 papers

Response Generation

8 benchmarks

914 papers

Topic Models

15 benchmarks

881 papers

Coreference Resolution

24 benchmarks

880 papers

Abstractive Text Summarization

41 benchmarks

846 papers

Intrusion Detection

20 benchmarks

800 papers

Cross-Lingual Transfer

55 benchmarks

782 papers

Document Summarization

22 benchmarks

760 papers

Entity Linking

55 benchmarks

735 papers

Question Generation

25 benchmarks

664 papers

Document Classification

23 benchmarks

641 papers

Semantic Role Labeling

10 benchmarks

620 papers

Sentence Embeddings

4 benchmarks

615 papers

Dialogue Generation

42 benchmarks

606 papers

Word Alignment

7 benchmarks

551 papers

POS Tagging

2 benchmarks

523 papers

Cross-Modal Retrieval

60 benchmarks

522 papers

Language Acquisition

1 benchmarks

522 papers

Hate Speech Detection

23 benchmarks

507 papers

Open-Domain Question Answering

41 benchmarks

494 papers

Fake News Detection

15 benchmarks

490 papers

Relational Reasoning

7 benchmarks

483 papers

Aspect-Based Sentiment Analysis (ABSA)

63 benchmarks

469 papers

Text Simplification

37 benchmarks

468 papers

Emotion Classification

17 benchmarks

458 papers

Slot Filling

31 benchmarks

458 papers

Morphological Analysis

0 benchmarks

452 papers

Chunking

9 benchmarks

447 papers

Event Extraction

22 benchmarks

446 papers

Relation Classification

20 benchmarks

445 papers

GSM8K

2 benchmarks

439 papers

Text-To-SQL

19 benchmarks

424 papers

Grammatical Error Correction

21 benchmarks

415 papers

Self-Learning

0 benchmarks

404 papers

Word Similarity

1 benchmarks

378 papers

Text Matching

0 benchmarks

364 papers

Lemmatization

0 benchmarks

351 papers

Intent Classification

6 benchmarks

344 papers

Stance Detection

33 benchmarks

343 papers

Natural Language Queries

5 benchmarks

337 papers

Intent Detection

46 benchmarks

330 papers

document understanding

0 benchmarks

309 papers

Task-Oriented Dialogue Systems

10 benchmarks

308 papers

Safety Alignment

0 benchmarks

288 papers

Argument Mining

11 benchmarks

284 papers

Sarcasm Detection

11 benchmarks

266 papers

Novelty Detection

0 benchmarks

249 papers

Story Generation

47 benchmarks

235 papers

Explanation Generation

10 benchmarks

235 papers

Data-to-Text Generation

119 benchmarks

219 papers

Fact Verification

7 benchmarks

216 papers

Authorship Attribution

0 benchmarks

212 papers

Paraphrase Generation

6 benchmarks

209 papers

Open Information Extraction

42 benchmarks

207 papers

Constituency Parsing

11 benchmarks

204 papers

Text-to-Video Generation

13 benchmarks

201 papers

Bias Detection

7 benchmarks

199 papers

Model Editing

4 benchmarks

193 papers

Spelling Correction

1 benchmarks

193 papers

Discourse Parsing

16 benchmarks

189 papers

Text Style Transfer

4 benchmarks

186 papers

Protein Folding

0 benchmarks

185 papers

Entity Resolution

21 benchmarks

184 papers

De-identification

0 benchmarks

174 papers

Paraphrase Identification

19 benchmarks

172 papers

Keyword Extraction

9 benchmarks

172 papers

Entity Typing

18 benchmarks

170 papers

Document Ranking

3 benchmarks

168 papers

Abusive Language

0 benchmarks

166 papers

Term Extraction

7 benchmarks

160 papers

Entity Disambiguation

12 benchmarks

156 papers

Conversational Search

0 benchmarks

154 papers

Keyphrase Extraction

7 benchmarks

153 papers

Sentence Compression

2 benchmarks

149 papers

Lexical Simplification

0 benchmarks

147 papers

Speech-to-Text Translation

14 benchmarks

146 papers

Conversational Question Answering

2 benchmarks

142 papers

Morphological Inflection

0 benchmarks

135 papers

text annotation

0 benchmarks

127 papers

Entity Extraction using GAN

0 benchmarks

127 papers

Word Translation

0 benchmarks

125 papers

Text Clustering

28 benchmarks

123 papers

Spam detection

1 benchmarks

117 papers

AMR Parsing

10 benchmarks

117 papers

Visual Storytelling

24 benchmarks

115 papers

Semantic Composition

0 benchmarks

110 papers

Deep Attention

0 benchmarks

109 papers

Multimodal Machine Translation

10 benchmarks

108 papers

Word Sense Induction

3 benchmarks

107 papers

Document-level Relation Extraction

6 benchmarks

106 papers

Automated Essay Scoring

1 benchmarks

104 papers

Stock Prediction

10 benchmarks

102 papers

Few-Shot Text Classification

19 benchmarks

100 papers

Token Classification

2 benchmarks

99 papers

Rumour Detection

2 benchmarks

98 papers

Dialogue Evaluation

4 benchmarks

97 papers

Multilingual NLP

0 benchmarks

96 papers

Twitter Sentiment Analysis

1 benchmarks

96 papers

Extractive Text Summarization

13 benchmarks

95 papers

Morphological Tagging

0 benchmarks

95 papers

Aspect Extraction

8 benchmarks

92 papers

3D Action Recognition

149 benchmarks

91 papers

Sentence Completion

2 benchmarks

91 papers

Phrase Grounding

10 benchmarks

88 papers

Temporal Relation Extraction

7 benchmarks

88 papers

Semantic Retrieval

1 benchmarks

86 papers

Temporal Information Extraction

17 benchmarks

86 papers

Long-Context Understanding

26 benchmarks

81 papers

Dialogue Understanding

34 benchmarks

79 papers

Emotional Intelligence

2 benchmarks

77 papers

Data-free Knowledge Distillation

4 benchmarks

75 papers

Key Information Extraction

9 benchmarks

74 papers

Passage Ranking

1 benchmarks

73 papers

Abuse Detection

23 benchmarks

73 papers

Linguistic Acceptability

10 benchmarks

72 papers

Cloze Test

12 benchmarks

71 papers

Table-to-Text Generation

30 benchmarks

68 papers

Authorship Verification

0 benchmarks

68 papers

Complex Word Identification

0 benchmarks

67 papers

Conditional Text Generation

1 benchmarks

67 papers

Humor Detection

1 benchmarks

64 papers

Subjectivity Analysis

2 benchmarks

63 papers

Question Similarity

1 benchmarks

62 papers

Propaganda detection

0 benchmarks

61 papers

Open-Domain Dialog

10 benchmarks

60 papers

Attribute Extraction

1 benchmarks

59 papers

Image-to-Text Retrieval

19 benchmarks

59 papers

Meme Classification

12 benchmarks

59 papers

Source Code Summarization

13 benchmarks

58 papers

Lexical Analysis

0 benchmarks

56 papers

Negation Detection

4 benchmarks

55 papers

Goal-Oriented Dialog

4 benchmarks

54 papers

Meeting Summarization

2 benchmarks

53 papers

Review Generation

0 benchmarks

51 papers

Punctuation Restoration

0 benchmarks

51 papers

Sentence Ordering

1 benchmarks

50 papers

Hallucination Evaluation

0 benchmarks

49 papers

Graph-to-Sequence

2 benchmarks

48 papers

Lexical Normalization

1 benchmarks

47 papers

Hope Speech Detection

2 benchmarks

47 papers

Decipherment

0 benchmarks

46 papers

Conversational Response Selection

35 benchmarks

46 papers

Arabic Sentiment Analysis

1 benchmarks

42 papers

Intent Discovery

3 benchmarks

42 papers

Ad-Hoc Information Retrieval

6 benchmarks

41 papers

Document AI

1 benchmarks

40 papers

Sign Language Production

0 benchmarks

40 papers

Recipe Generation

14 benchmarks

40 papers

Weakly Supervised Classification

2 benchmarks

39 papers

Code Repair

6 benchmarks

39 papers

Sentence-Pair Classification

0 benchmarks

38 papers

Attribute Value Extraction

4 benchmarks

35 papers

Morphological Disambiguation

0 benchmarks

34 papers

Age And Gender Classification

3 benchmarks

34 papers

Hypernym Discovery

9 benchmarks

33 papers

Passage Re-Ranking

2 benchmarks

32 papers

Aspect Category Detection

9 benchmarks

31 papers

News Generation

1 benchmarks

29 papers

Aggression Identification

0 benchmarks

27 papers

Winogrande

0 benchmarks

26 papers

Table-based Fact Verification

2 benchmarks

26 papers

Pretrained Multilingual Language Models

0 benchmarks

26 papers

Cross-Lingual Document Classification

14 benchmarks

25 papers

CCG Supertagging

1 benchmarks

24 papers

KG-to-Text Generation

35 benchmarks

22 papers

Dialog Act Classification

1 benchmarks

22 papers

Cross-Lingual Entity Linking

1 benchmarks

22 papers

Extreme Summarization

10 benchmarks

22 papers

Conversational Response Generation

7 benchmarks

22 papers

Scientific Document Summarization

7 benchmarks

22 papers

Inductive knowledge graph completion

98 benchmarks

20 papers

Binary text classification

9 benchmarks

20 papers

Probing Language Models

1 benchmarks

20 papers

Relationship Extraction (Distant Supervised)

8 benchmarks

19 papers

Toponym Resolution

0 benchmarks

19 papers

Vietnamese Datasets

0 benchmarks

17 papers

Gender Bias Detection

0 benchmarks

17 papers

Action Parsing

1 benchmarks

15 papers

Polyphone disambiguation

1 benchmarks

15 papers

Semantic entity labeling

2 benchmarks

14 papers

Fact Selection

1 benchmarks

14 papers

Author Attribution

0 benchmarks

13 papers

Commonsense Causal Reasoning

0 benchmarks

13 papers

Persian Sentiment Analysis

0 benchmarks

13 papers

Nested Mention Recognition

2 benchmarks

11 papers

Multi-agent Integration

1 benchmarks

9 papers

answerability prediction

1 benchmarks

9 papers

Zero-Shot Machine Translation

0 benchmarks

8 papers

Definition Modelling

0 benchmarks

8 papers

Summarization

11 benchmarks

8 papers

Open Intent Discovery

16 benchmarks

7 papers

Multimodal Text and Image Classification

28 benchmarks

7 papers

Multimodal Association

1 benchmarks

7 papers

Dialogue Rewriting

9 benchmarks

7 papers

News Annotation

0 benchmarks

6 papers

Handwriting Verification

3 benchmarks

6 papers

Aspect-Category-Opinion-Sentiment Quadruple Extraction

18 benchmarks

6 papers

Zero-shot Sentiment Classification

1 benchmarks

6 papers

Cross-Lingual Bitext Mining

4 benchmarks

6 papers

Syntax Representation

0 benchmarks

6 papers

Morpheme Segmentaiton

3 benchmarks

6 papers

Reading Order Detection

3 benchmarks

6 papers

Aspect Category Polarity

1 benchmarks

6 papers

Binary Condescension Detection

1 benchmarks

5 papers

Job classification

0 benchmarks

5 papers

Stereotypical Bias Analysis

10 benchmarks

5 papers

Multi-label Condescension Detection

1 benchmarks

5 papers

Text2text Generation

1 benchmarks

5 papers

Text Effects Transfer

0 benchmarks

5 papers

AI and Safety

0 benchmarks

4 papers

Text-Variation

0 benchmarks

4 papers

Text-to-video search

0 benchmarks

4 papers

Chemical Indexing

1 benchmarks

4 papers

Twitter Event Detection

3 benchmarks

4 papers

Logical Reasoning Reading Comprehension

0 benchmarks

4 papers

Attribute Mining

3 benchmarks

4 papers

Recognizing Emotion Cause in Conversations

10 benchmarks

3 papers

Personality Recognition in Conversation

7 benchmarks

3 papers

Phrase Ranking

4 benchmarks

3 papers

Conversational Web Navigation

4 benchmarks

3 papers

Memex Question Answering

1 benchmarks

3 papers

Phrase Tagging

6 benchmarks

3 papers

AMR Graph Similarity

2 benchmarks

3 papers

Record linking

0 benchmarks

3 papers

Math Information Retrieval

4 benchmarks

2 papers

Turkish Text Diacritization

1 benchmarks

2 papers

Hate Span Identification

0 benchmarks

2 papers

Hungarian Text Diacritization

1 benchmarks

2 papers

Workflow Discovery

4 benchmarks

2 papers

Negation and Speculation Scope resolution

0 benchmarks

2 papers

Role-filler Entity Extraction

1 benchmarks

2 papers

Open Relation Modeling

0 benchmarks

2 papers

ValNov

6 benchmarks

2 papers

SemEval-2022 Task 4-1 (Binary PCL Detection)

1 benchmarks

2 papers

Czech Text Diacritization

1 benchmarks

2 papers

Slovak Text Diacritization

1 benchmarks

2 papers

Irish Text Diacritization

1 benchmarks

2 papers

Vietnamese Text Diacritization

1 benchmarks

2 papers

Croatian Text Diacritization

1 benchmarks

2 papers

Context Query Reformulation

0 benchmarks

2 papers

French Text Diacritization

1 benchmarks

2 papers

Latvian Text Diacritization

1 benchmarks

2 papers

Spanish Text Diacritization

1 benchmarks

2 papers

Description-guided molecule generation

1 benchmarks

2 papers

Negation and Speculation Cue Detection

2 benchmarks

2 papers

Romanian Text Diacritization

1 benchmarks

2 papers

Text-to-GQL

0 benchmarks

2 papers

Speaker Attribution in German Parliamentary Debates (GermEval 2023, subtask 1)

1 benchmarks

1 papers

Clinical Assertion Status Detection

1 benchmarks

1 papers

GermEval2024 Shared Task 1 Subtask 1

1 benchmarks

1 papers

Crowdsourced Text Aggregation

2 benchmarks

1 papers

Joint Entity and Relation Extraction on Scientific Data

0 benchmarks

1 papers

Multilingual Machine Comprehension in English Hindi

8 benchmarks

1 papers

multi-word expression sememe prediction

0 benchmarks

1 papers

Asynchronous Group Communication

0 benchmarks

1 papers

NLP based Person Retrival

2 benchmarks

1 papers

Multimodal Text Prediction

2 benchmarks

1 papers

GermEval2024 Shared Task 1 Subtask 2

1 benchmarks

1 papers

TinyQA Benchmark++

2 benchmarks

1 papers

Multi-Grained Named Entity Recognition

0 benchmarks

1 papers

multi-word expression embedding

0 benchmarks

1 papers

Question-Answer categorization

4 benchmarks

1 papers

Poem meters classification

1 benchmarks

1 papers

Optical Charater Recogntion

1 benchmarks

0 papers

Multlingual Neural Machine Translation

0 benchmarks

0 papers

Question to Declarative Sentence

0 benchmarks

0 papers

Chinese

14 benchmarks

0 papers

Clickbait Detection

0 benchmarks

0 papers

Web Page Tagging

0 benchmarks

0 papers

Overlapping Mention Recognition

0 benchmarks

0 papers

Cross-Lingual

70 benchmarks

0 papers

Abstract Argumentation

0 benchmarks

0 papers

Language Modeling

0 benchmarks

0 papers

Joint NER and Classification

0 benchmarks

0 papers

Counterspeech Detection

1 benchmarks

0 papers

Natural Language Transduction

9 benchmarks

0 papers

Anaphora Resolution

1 benchmarks

0 papers

Query Wellformedness

1 benchmarks

0 papers

Automatic Writing

0 benchmarks

0 papers

Phrase Vector Embedding

0 benchmarks

0 papers

incongruity detection

0 benchmarks

0 papers

Text Attribute Transfer

0 benchmarks

0 papers

Relation Mention Extraction

0 benchmarks

0 papers

Sentence Pair Modeling

13 benchmarks

0 papers

Speculation Detection

3 benchmarks

0 papers

Automated Writing Evaluation

0 benchmarks

0 papers

Meme Captioning

0 benchmarks

0 papers

Sentence Summarization

0 benchmarks

0 papers

Thai Word Segmentation

2 benchmarks

0 papers

Misogynistic Aggression Identification

0 benchmarks

0 papers

Commonsense Reasoning for RL

1 benchmarks

0 papers

Text Compression

0 benchmarks

0 papers

Extractive Tags Summarization

0 benchmarks

0 papers

Face Selection

0 benchmarks

0 papers

Domain Labelling

1 benchmarks

0 papers

Turning Point Identification

0 benchmarks

0 papers

Temporal Processing

21 benchmarks

0 papers

Taxonomy Learning

9 benchmarks

0 papers

Shallow Syntax

9 benchmarks

0 papers

Chinese Spelling Error Correction

0 benchmarks

0 papers

Vietnamese Word Segmentation

0 benchmarks

0 papers

Cognate Prediction

0 benchmarks

0 papers

Emergent communications on relations

0 benchmarks

0 papers

Chinese Spell Checking

2 benchmarks

0 papers

Complaint Comment Classification

0 benchmarks

0 papers

Reliable Intelligence Identification

0 benchmarks

0 papers

Vietnamese Parsing

0 benchmarks

0 papers

Job Prediction

0 benchmarks

0 papers

Diachronic Word Embeddings

0 benchmarks

0 papers

Vietnamese Aspect-Based Sentiment Analysis

0 benchmarks

0 papers

Text Anonymization

0 benchmarks

0 papers

Cross-Lingual Word Embeddings

0 benchmarks

0 papers

ARQMath2

0 benchmarks

0 papers

Suggestion mining

0 benchmarks

0 papers

Hate Speech Normalization

0 benchmarks

0 papers

Hate Intensity Prediction

0 benchmarks

0 papers

Reverse Dictionary

0 benchmarks

0 papers

Continual Named Entity Recognition

6 benchmarks

0 papers

Readability optimization

0 benchmarks

0 papers

Emotion Detection and Trigger Summarization

0 benchmarks

0 papers

Comment Generation

0 benchmarks

0 papers

Multi-lingual Text-to-Image Generation

0 benchmarks

0 papers

Cross-lingual Text-to-Image Generation

0 benchmarks

0 papers

Coding Problem Tagging

0 benchmarks

0 papers

Collaborative Plan Acquisition

0 benchmarks

0 papers

molecular representation

0 benchmarks

0 papers

Legal Reasoning

2 benchmarks

0 papers

Linguistic steganography

0 benchmarks

0 papers

trustable and focussed LLM generated content

0 benchmarks

0 papers

Japanese Word Segmentation

1 benchmarks

0 papers

WNLI

0 benchmarks

0 papers

Personality Generation

0 benchmarks

0 papers

In-Context Learning

0 benchmarks

0 papers

nlg evaluation

0 benchmarks

0 papers

Only Connect Walls Dataset Task 2 (Connections)

0 benchmarks

0 papers

Vietnamese Language Models

0 benchmarks

0 papers

Vietnamese Natural Language Understanding

0 benchmarks

0 papers

Social Media Mental Health Detection

0 benchmarks

0 papers

Vietnamese Fact Checking

0 benchmarks

0 papers

Simultaneous Speech-to-Speech Translation

0 benchmarks

0 papers

Vietnamese Scene Text

0 benchmarks

0 papers

Vietnamese Sentiment Analysis

0 benchmarks

0 papers

Joint Multilingual Sentence Representations

0 benchmarks

0 papers

HellaSwag

0 benchmarks

0 papers

Script Generation

0 benchmarks

0 papers

Vietnamese Lexical Normalization

0 benchmarks

0 papers

Vietnamese Hate Speech Detection

0 benchmarks

0 papers

Vietnamese Speech Recognition

0 benchmarks

0 papers

Multi-Dialect Vietnamese

0 benchmarks

0 papers

Political evalutation

0 benchmarks

0 papers

Semi-Supervised Text Regression

0 benchmarks

0 papers

Relevance Detection

0 benchmarks

0 papers

Drug Design

0 benchmarks

0 papers

ArabicMMLU

0 benchmarks

0 papers

Self-Evolving AI

0 benchmarks

0 papers

Philosophical Reflection

0 benchmarks

0 papers

Text Normalization

0 benchmarks

0 papers

Nested Term Recognition

5 benchmarks

0 papers

Retrieval-augmented Generation

0 benchmarks

0 papers

Human Agent Collaboration

0 benchmarks

0 papers