Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Image Captioning
/
nocaps in-domain
Image Captioning on nocaps in-domain
Metric: SPICE (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
SPICE (best first)
SPICE (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
SPICE
▼
Extra Data
Paper
Date
↕
Code
1
GIT2, Single Model
16.36
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
2
GIT, Single Model
16.18
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
3
PaLI
15.69
No
PaLI: A Jointly-Scaled Multilingual Language-Ima...
2022-09-14
Code
4
CoCa - Google Brain
15.49
No
-
-
-
5
Microsoft Cognitive Services team
15.22
No
VIVO: Visual Vocabulary Pre-Training for Novel O...
2020-09-28
-
6
firethehole
15.17
No
-
-
-
7
FudanFVL
15.04
No
-
-
-
8
vll@mk514
14.99
No
-
-
-
9
Human
14.99
No
-
-
-
10
FudanWYZ
14.85
No
-
-
-
11
Single Model
14.6
No
SimVLM: Simple Visual Language Model Pretraining...
2021-08-24
Code
12
IEDA-LAB
14.47
No
-
-
-
13
MD
14.08
No
-
-
-
14
VinVL (Microsoft Cognitive Services + MSR)
13.63
No
VinVL: Revisiting Visual Representations in Visi...
2021-01-02
Code
15
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
13.61
No
-
-
-
16
GRIT (zero-shot, no VL pretraining, no CBS)
13.6
No
GRIT: Faster and Better Image captioning Transfo...
2022-07-20
Code
17
camel XE
13.04
No
-
-
-
18
RCAL
12.79
No
-
-
-
19
evertyhing
12.6
No
-
-
-
20
MQ-UpDown-C
12.38
No
-
-
-
21
cxy_nocaps_training
12.35
No
-
-
-
22
作者给的test文件
12.35
No
-
-
-
23
Xinyi
12.3
No
-
-
-
24
icgp2ssi1_coco_si_0.02_5_test
12.28
No
-
-
-
25
ClipCap (MLP + GPT2 tuning)
12.2
No
ClipCap: CLIP Prefix for Image Captioning
2021-11-18
Code
26
ClipCap (Transformer)
12.14
No
ClipCap: CLIP Prefix for Image Captioning
2021-11-18
Code
27
Oscar
12.06
No
-
-
-
28
7_10-7_40000_predict_test.json
12.04
No
-
-
-
29
UpDown + ELMo + CBS
11.8
No
-
-
-
30
UpDown
11.47
No
-
-
-
31
nocaps_training
11.46
No
-
-
-
32
None
11.07
No
-
-
-
33
YX
10.94
No
-
-
-
34
area_attention
10.87
No
-
-
-
35
B2
10.55
No
-
-
-
36
Neural Baby Talk + CBS
10.13
No
-
-
-
37
coco_all_19
10.11
No
-
-
-
38
Neural Baby Talk
9.81
No
-
-
-
39
Yu-Wu
9.16
No
-
-
-
40
CS395T
8.91
No
-
-
-
#1
GIT2, Single Model
SOTA
16.36
SPICE
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#2
GIT, Single Model
16.18
SPICE
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#3
PaLI
15.69
SPICE
· 2022-09-14
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Code
#4
CoCa - Google Brain
15.49
SPICE
No paper
#5
Microsoft Cognitive Services team
SOTA
15.22
SPICE
· 2020-09-28
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
#6
firethehole
15.17
SPICE
No paper
#7
FudanFVL
15.04
SPICE
No paper
#8
vll@mk514
14.99
SPICE
No paper
#9
Human
14.99
SPICE
No paper
#10
FudanWYZ
14.85
SPICE
No paper
#11
Single Model
14.6
SPICE
· 2021-08-24
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Code
#12
IEDA-LAB
14.47
SPICE
No paper
#13
MD
14.08
SPICE
No paper
#14
VinVL (Microsoft Cognitive Services + MSR)
13.63
SPICE
· 2021-01-02
VinVL: Revisiting Visual Representations in Vision-Language Models
Code
#15
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
13.61
SPICE
No paper
#16
GRIT (zero-shot, no VL pretraining, no CBS)
13.6
SPICE
· 2022-07-20
GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features
Code
#17
camel XE
13.04
SPICE
No paper
#18
RCAL
12.79
SPICE
No paper
#19
evertyhing
12.6
SPICE
No paper
#20
MQ-UpDown-C
12.38
SPICE
No paper
#21
cxy_nocaps_training
12.35
SPICE
No paper
#22
作者给的test文件
12.35
SPICE
No paper
#23
Xinyi
12.3
SPICE
No paper
#24
icgp2ssi1_coco_si_0.02_5_test
12.28
SPICE
No paper
#25
ClipCap (MLP + GPT2 tuning)
12.2
SPICE
· 2021-11-18
ClipCap: CLIP Prefix for Image Captioning
Code
#26
ClipCap (Transformer)
12.14
SPICE
· 2021-11-18
ClipCap: CLIP Prefix for Image Captioning
Code
#27
Oscar
12.06
SPICE
No paper
#28
7_10-7_40000_predict_test.json
12.04
SPICE
No paper
#29
UpDown + ELMo + CBS
11.8
SPICE
No paper
#30
UpDown
11.47
SPICE
No paper
#31
nocaps_training
11.46
SPICE
No paper
#32
None
11.07
SPICE
No paper
#33
YX
10.94
SPICE
No paper
#34
area_attention
10.87
SPICE
No paper
#35
B2
10.55
SPICE
No paper
#36
Neural Baby Talk + CBS
10.13
SPICE
No paper
#37
coco_all_19
10.11
SPICE
No paper
#38
Neural Baby Talk
9.81
SPICE
No paper
#39
Yu-Wu
9.16
SPICE
No paper
#40
CS395T
8.91
SPICE
No paper