Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Image Captioning
/
COCO (Common Objects in Context)
Image Captioning on COCO (Common Objects in Context)
Metric: CIDEr (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
CIDEr (best first)
CIDEr (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
CIDEr
▼
Extra Data
Paper
Date
↕
Code
1
ExpansionNet v2
143.7
No
Exploiting Multiple Sequence Lengths in Fast End...
2022-08-13
Code
2
M2 Transformer
131.2
No
Meshed-Memory Transformer for Image Captioning
2019-12-17
Code
3
IGINet
131
No
-
-
-
4
UNIMO-large
127.7
No
UNIMO: Towards Unified-Modal Understanding and G...
2020-12-31
Code
5
RDN
125.2
No
Reflective Decoding Network for Image Captioning
2019-08-30
-
6
Lyrics
121.1
No
Lyrics: Boosting Fine-grained Language-Vision Al...
2023-12-08
-
7
Bit Diffusion (20 steps)
115
No
Analog Bits: Generating Discrete Data using Diff...
2022-08-08
Code
8
Flamingo (80B; 4-shot)
103
No
Retrieval-Augmented Multimodal Language Modeling
2022-11-22
-
9
RA-CM3 (2.7B)
89.1
No
Retrieval-Augmented Multimodal Language Modeling
2022-11-22
-
10
Flamingo (3B; 4-shot)
85
No
Retrieval-Augmented Multimodal Language Modeling
2022-11-22
-
11
Perturb, Predict & Paraphrase
84.5
No
-
-
Code
12
Parti
83.9
No
Retrieval-Augmented Multimodal Language Modeling
2022-11-22
-
13
NIC (ResNet-50, CutMix)
77.6
No
CutMix: Regularization Strategy to Train Strong ...
2019-05-13
Code
14
Vanilla CM3
71.9
No
Retrieval-Augmented Multimodal Language Modeling
2022-11-22
-
15
X-LXMERT
55.8
No
Retrieval-Augmented Multimodal Language Modeling
2022-11-22
-
16
minDALL-E
48
No
Retrieval-Augmented Multimodal Language Modeling
2022-11-22
-
17
ruDALL-E-XL
38.7
No
Retrieval-Augmented Multimodal Language Modeling
2022-11-22
-
18
DALL-E
20.2
No
Retrieval-Augmented Multimodal Language Modeling
2022-11-22
-
#1
ExpansionNet v2
SOTA
143.7
CIDEr
· 2022-08-13
Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning
Code
#2
M2 Transformer
SOTA
131.2
CIDEr
· 2019-12-17
Meshed-Memory Transformer for Image Captioning
Code
#3
IGINet
131
CIDEr
No paper
#4
UNIMO-large
127.7
CIDEr
· 2020-12-31
UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning
Code
#5
RDN
SOTA
125.2
CIDEr
· 2019-08-30
Reflective Decoding Network for Image Captioning
#6
Lyrics
121.1
CIDEr
· 2023-12-08
Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects
#7
Bit Diffusion (20 steps)
115
CIDEr
· 2022-08-08
Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning
Code
#8
Flamingo (80B; 4-shot)
103
CIDEr
· 2022-11-22
Retrieval-Augmented Multimodal Language Modeling
#9
RA-CM3 (2.7B)
89.1
CIDEr
· 2022-11-22
Retrieval-Augmented Multimodal Language Modeling
#10
Flamingo (3B; 4-shot)
85
CIDEr
· 2022-11-22
Retrieval-Augmented Multimodal Language Modeling
#11
Perturb, Predict & Paraphrase
84.5
CIDEr
No paper
Code
#12
Parti
83.9
CIDEr
· 2022-11-22
Retrieval-Augmented Multimodal Language Modeling
#13
NIC (ResNet-50, CutMix)
SOTA
77.6
CIDEr
· 2019-05-13
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features
Code
#14
Vanilla CM3
71.9
CIDEr
· 2022-11-22
Retrieval-Augmented Multimodal Language Modeling
#15
X-LXMERT
55.8
CIDEr
· 2022-11-22
Retrieval-Augmented Multimodal Language Modeling
#16
minDALL-E
48
CIDEr
· 2022-11-22
Retrieval-Augmented Multimodal Language Modeling
#17
ruDALL-E-XL
38.7
CIDEr
· 2022-11-22
Retrieval-Augmented Multimodal Language Modeling
#18
DALL-E
20.2
CIDEr
· 2022-11-22
Retrieval-Augmented Multimodal Language Modeling