Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Image Captioning
/
COCO (Common Objects in Context)
Image Captioning on COCO (Common Objects in Context)
Metric: CIDEr (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
#
Model
↕
CIDEr
▼
Extra Data
Paper
Date
↕
Code
1
ExpansionNet v2
143.7
No
Exploiting Multiple Sequence Lengths in Fast End...
2022-08-13
Code
2
M2 Transformer
131.2
No
Meshed-Memory Transformer for Image Captioning
2019-12-17
Code
3
IGINet
131
No
-
-
-
4
UNIMO-large
127.7
No
UNIMO: Towards Unified-Modal Understanding and G...
2020-12-31
Code
5
RDN
125.2
No
Reflective Decoding Network for Image Captioning
2019-08-30
-
6
Lyrics
121.1
No
Lyrics: Boosting Fine-grained Language-Vision Al...
2023-12-08
-
7
Bit Diffusion (20 steps)
115
No
Analog Bits: Generating Discrete Data using Diff...
2022-08-08
Code
8
Flamingo (80B; 4-shot)
103
No
Retrieval-Augmented Multimodal Language Modeling
2022-11-22
-
9
RA-CM3 (2.7B)
89.1
No
Retrieval-Augmented Multimodal Language Modeling
2022-11-22
-
10
Flamingo (3B; 4-shot)
85
No
Retrieval-Augmented Multimodal Language Modeling
2022-11-22
-
11
Perturb, Predict & Paraphrase
84.5
No
-
-
Code
12
Parti
83.9
No
Retrieval-Augmented Multimodal Language Modeling
2022-11-22
-
13
NIC (ResNet-50, CutMix)
77.6
No
CutMix: Regularization Strategy to Train Strong ...
2019-05-13
Code
14
Vanilla CM3
71.9
No
Retrieval-Augmented Multimodal Language Modeling
2022-11-22
-
15
X-LXMERT
55.8
No
Retrieval-Augmented Multimodal Language Modeling
2022-11-22
-
16
minDALL-E
48
No
Retrieval-Augmented Multimodal Language Modeling
2022-11-22
-
17
ruDALL-E-XL
38.7
No
Retrieval-Augmented Multimodal Language Modeling
2022-11-22
-
18
DALL-E
20.2
No
Retrieval-Augmented Multimodal Language Modeling
2022-11-22
-