Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Image Captioning
/
nocaps out-of-domain
Image Captioning on nocaps out-of-domain
Metric: CIDEr (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
CIDEr (best first)
CIDEr (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
CIDEr
▼
Extra Data
Paper
Date
↕
Code
1
PaLI
126.67
No
PaLI: A Jointly-Scaled Multilingual Language-Ima...
2022-09-14
Code
2
GIT2, Single Model
122.27
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
3
GIT, Single Model
122.04
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
4
CoCa - Google Brain
121.69
No
-
-
-
5
Microsoft Cognitive Services team
110.14
No
VIVO: Visual Vocabulary Pre-Training for Novel O...
2020-09-28
-
6
Single Model
109.49
No
SimVLM: Simple Visual Language Model Pretraining...
2021-08-24
Code
7
FudanFVL
106.55
No
-
-
-
8
FudanWYZ
103.75
No
-
-
-
9
Human
91.62
No
-
-
-
10
firethehole
88.54
No
-
-
-
11
IEDA-LAB
87.51
No
-
-
-
12
icgp2ssi1_coco_si_0.02_5_test
87.15
No
-
-
-
13
evertyhing
85.18
No
-
-
-
14
vll@mk514
78.91
No
-
-
-
15
VinVL (Microsoft Cognitive Services + MSR)
78.01
No
VinVL: Revisiting Visual Representations in Visi...
2021-01-02
Code
16
MD
77.39
No
-
-
-
17
RCAL
75.39
No
-
-
-
18
Oscar
73.75
No
-
-
-
19
GRIT (zero-shot, no CBS, no VL pretraining, single model)
72.6
No
GRIT: Faster and Better Image captioning Transfo...
2022-07-20
Code
20
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
72.13
No
-
-
-
21
vinvl_yuan_cbs
71.43
No
-
-
-
22
UpDown-C
70.21
No
-
-
-
23
Xinyi
68.92
No
-
-
-
24
cxy_nocaps_training
68.5
No
-
-
-
25
UpDown + ELMo + CBS
66.67
No
-
-
-
26
Neural Baby Talk + CBS
58.48
No
-
-
-
27
camel XE
54.56
No
-
-
-
28
ClipCap (MLP + GPT2 tuning)
49.35
No
ClipCap: CLIP Prefix for Image Captioning
2021-11-18
Code
29
ClipCap (Transformer)
49.14
No
ClipCap: CLIP Prefix for Image Captioning
2021-11-18
Code
30
Neural Baby Talk
48.73
No
-
-
-
31
7_10-7_40000_predict_test.json
43.2
No
-
-
-
32
Yu-Wu
39.39
No
-
-
-
33
Check
36.12
No
-
-
-
34
nocaps_training
30.09
No
-
-
-
35
UpDown
30.09
No
-
-
-
36
area_attention
26.55
No
-
-
-
37
YX
26.25
No
-
-
-
38
B2
25.91
No
-
-
-
39
coco_all_19
23.07
No
-
-
-
40
CS395T
21.3
No
-
-
-
#1
PaLI
SOTA
126.67
CIDEr
· 2022-09-14
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Code
#2
GIT2, Single Model
SOTA
122.27
CIDEr
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#3
GIT, Single Model
122.04
CIDEr
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#4
CoCa - Google Brain
121.69
CIDEr
No paper
#5
Microsoft Cognitive Services team
SOTA
110.14
CIDEr
· 2020-09-28
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
#6
Single Model
109.49
CIDEr
· 2021-08-24
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Code
#7
FudanFVL
106.55
CIDEr
No paper
#8
FudanWYZ
103.75
CIDEr
No paper
#9
Human
91.62
CIDEr
No paper
#10
firethehole
88.54
CIDEr
No paper
#11
IEDA-LAB
87.51
CIDEr
No paper
#12
icgp2ssi1_coco_si_0.02_5_test
87.15
CIDEr
No paper
#13
evertyhing
85.18
CIDEr
No paper
#14
vll@mk514
78.91
CIDEr
No paper
#15
VinVL (Microsoft Cognitive Services + MSR)
78.01
CIDEr
· 2021-01-02
VinVL: Revisiting Visual Representations in Vision-Language Models
Code
#16
MD
77.39
CIDEr
No paper
#17
RCAL
75.39
CIDEr
No paper
#18
Oscar
73.75
CIDEr
No paper
#19
GRIT (zero-shot, no CBS, no VL pretraining, single model)
72.6
CIDEr
· 2022-07-20
GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features
Code
#20
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
72.13
CIDEr
No paper
#21
vinvl_yuan_cbs
71.43
CIDEr
No paper
#22
UpDown-C
70.21
CIDEr
No paper
#23
Xinyi
68.92
CIDEr
No paper
#24
cxy_nocaps_training
68.5
CIDEr
No paper
#25
UpDown + ELMo + CBS
66.67
CIDEr
No paper
#26
Neural Baby Talk + CBS
58.48
CIDEr
No paper
#27
camel XE
54.56
CIDEr
No paper
#28
ClipCap (MLP + GPT2 tuning)
49.35
CIDEr
· 2021-11-18
ClipCap: CLIP Prefix for Image Captioning
Code
#29
ClipCap (Transformer)
49.14
CIDEr
· 2021-11-18
ClipCap: CLIP Prefix for Image Captioning
Code
#30
Neural Baby Talk
48.73
CIDEr
No paper
#31
7_10-7_40000_predict_test.json
43.2
CIDEr
No paper
#32
Yu-Wu
39.39
CIDEr
No paper
#33
Check
36.12
CIDEr
No paper
#34
nocaps_training
30.09
CIDEr
No paper
#35
UpDown
30.09
CIDEr
No paper
#36
area_attention
26.55
CIDEr
No paper
#37
YX
26.25
CIDEr
No paper
#38
B2
25.91
CIDEr
No paper
#39
coco_all_19
23.07
CIDEr
No paper
#40
CS395T
21.3
CIDEr
No paper