Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Image Captioning
/
nocaps in-domain
Image Captioning on nocaps in-domain
Metric: CIDEr (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
CIDEr (best first)
CIDEr (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
CIDEr
▼
Extra Data
Paper
Date
↕
Code
1
PaLI
149.1
No
PaLI: A Jointly-Scaled Multilingual Language-Ima...
2022-09-14
Code
2
GIT2, Single Model
124.18
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
3
GIT, Single Model
122.4
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
4
PaLI
121.09
No
PaLI: A Jointly-Scaled Multilingual Language-Ima...
2022-09-14
Code
5
CoCa - Google Brain
117.9
No
-
-
-
6
Microsoft Cognitive Services team
112.82
No
VIVO: Visual Vocabulary Pre-Training for Novel O...
2020-09-28
-
7
Single Model
108.98
No
SimVLM: Simple Visual Language Model Pretraining...
2021-08-24
Code
8
GRIT (zero-shot, no VL pretraining, no CBS)
105.9
No
GRIT: Faster and Better Image captioning Transfo...
2022-07-20
Code
9
FudanFVL
104.9
No
-
-
-
10
FudanWYZ
104.25
No
-
-
-
11
IEDA-LAB
102.64
No
-
-
-
12
vll@mk514
101.69
No
-
-
-
13
MD
100.03
No
-
-
-
14
firethehole
99.9
No
-
-
-
15
VinVL (Microsoft Cognitive Services + MSR)
97.99
No
VinVL: Revisiting Visual Representations in Visi...
2021-01-02
Code
16
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
96.63
No
-
-
-
17
camel XE
88.08
No
-
-
-
18
evertyhing
87.86
No
-
-
-
19
RCAL
87.28
No
-
-
-
20
icgp2ssi1_coco_si_0.02_5_test
87.21
No
-
-
-
21
cxy_nocaps_training
85.81
No
-
-
-
22
作者给的test文件
85.81
No
-
-
-
23
ClipCap (Transformer)
84.85
No
ClipCap: CLIP Prefix for Image Captioning
2021-11-18
Code
24
Oscar
84.83
No
-
-
-
25
Xinyi
84.79
No
-
-
-
26
Human
80.61
No
-
-
-
27
MQ-UpDown-C
80.19
No
-
-
-
28
ClipCap (MLP + GPT2 tuning)
79.73
No
ClipCap: CLIP Prefix for Image Captioning
2021-11-18
Code
29
UpDown + ELMo + CBS
76.02
No
-
-
-
30
UpDown
74.27
No
-
-
-
31
nocaps_training
74.27
No
-
-
-
32
7_10-7_40000_predict_test.json
73.73
No
-
-
-
33
None
70.33
No
-
-
-
34
YX
69.59
No
-
-
-
35
B2
68.98
No
-
-
-
36
area_attention
67.91
No
-
-
-
37
coco_all_19
64.37
No
-
-
-
38
Neural Baby Talk + CBS
62.96
No
-
-
-
39
Neural Baby Talk
60.89
No
-
-
-
40
CS395T
58.93
No
-
-
-
41
Yu-Wu
53.34
No
-
-
-
#1
PaLI
SOTA
149.1
CIDEr
· 2022-09-14
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Code
#2
GIT2, Single Model
SOTA
124.18
CIDEr
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#3
GIT, Single Model
122.4
CIDEr
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#4
PaLI
121.09
CIDEr
· 2022-09-14
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Code
#5
CoCa - Google Brain
117.9
CIDEr
No paper
#6
Microsoft Cognitive Services team
SOTA
112.82
CIDEr
· 2020-09-28
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
#7
Single Model
108.98
CIDEr
· 2021-08-24
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Code
#8
GRIT (zero-shot, no VL pretraining, no CBS)
105.9
CIDEr
· 2022-07-20
GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features
Code
#9
FudanFVL
104.9
CIDEr
No paper
#10
FudanWYZ
104.25
CIDEr
No paper
#11
IEDA-LAB
102.64
CIDEr
No paper
#12
vll@mk514
101.69
CIDEr
No paper
#13
MD
100.03
CIDEr
No paper
#14
firethehole
99.9
CIDEr
No paper
#15
VinVL (Microsoft Cognitive Services + MSR)
97.99
CIDEr
· 2021-01-02
VinVL: Revisiting Visual Representations in Vision-Language Models
Code
#16
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
96.63
CIDEr
No paper
#17
camel XE
88.08
CIDEr
No paper
#18
evertyhing
87.86
CIDEr
No paper
#19
RCAL
87.28
CIDEr
No paper
#20
icgp2ssi1_coco_si_0.02_5_test
87.21
CIDEr
No paper
#21
cxy_nocaps_training
85.81
CIDEr
No paper
#22
作者给的test文件
85.81
CIDEr
No paper
#23
ClipCap (Transformer)
84.85
CIDEr
· 2021-11-18
ClipCap: CLIP Prefix for Image Captioning
Code
#24
Oscar
84.83
CIDEr
No paper
#25
Xinyi
84.79
CIDEr
No paper
#26
Human
80.61
CIDEr
No paper
#27
MQ-UpDown-C
80.19
CIDEr
No paper
#28
ClipCap (MLP + GPT2 tuning)
79.73
CIDEr
· 2021-11-18
ClipCap: CLIP Prefix for Image Captioning
Code
#29
UpDown + ELMo + CBS
76.02
CIDEr
No paper
#30
UpDown
74.27
CIDEr
No paper
#31
nocaps_training
74.27
CIDEr
No paper
#32
7_10-7_40000_predict_test.json
73.73
CIDEr
No paper
#33
None
70.33
CIDEr
No paper
#34
YX
69.59
CIDEr
No paper
#35
B2
68.98
CIDEr
No paper
#36
area_attention
67.91
CIDEr
No paper
#37
coco_all_19
64.37
CIDEr
No paper
#38
Neural Baby Talk + CBS
62.96
CIDEr
No paper
#39
Neural Baby Talk
60.89
CIDEr
No paper
#40
CS395T
58.93
CIDEr
No paper
#41
Yu-Wu
53.34
CIDEr
No paper