Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Image Captioning
/
nocaps out-of-domain
Image Captioning on nocaps out-of-domain
Metric: SPICE (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
SPICE (best first)
SPICE (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
SPICE
▼
Extra Data
Paper
Date
↕
Code
1
GIT, Single Model
15.7
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
2
GIT2, Single Model
15.62
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
3
PaLI
15.49
No
PaLI: A Jointly-Scaled Multilingual Language-Ima...
2022-09-14
Code
4
CoCa - Google Brain
15.13
No
-
-
-
5
FudanFVL
14.21
No
-
-
-
6
Human
14.21
No
-
-
-
7
Single Model
13.89
No
SimVLM: Simple Visual Language Model Pretraining...
2021-08-24
Code
8
firethehole
13.87
No
-
-
-
9
FudanWYZ
13.75
No
-
-
-
10
Microsoft Cognitive Services team
13.74
No
VIVO: Visual Vocabulary Pre-Training for Novel O...
2020-09-28
-
11
IEDA-LAB
12.52
No
-
-
-
12
vll@mk514
12.14
No
-
-
-
13
MD
11.59
No
-
-
-
14
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
11.53
No
-
-
-
15
VinVL (Microsoft Cognitive Services + MSR)
11.48
No
VinVL: Revisiting Visual Representations in Visi...
2021-01-02
Code
16
icgp2ssi1_coco_si_0.02_5_test
11.43
No
-
-
-
17
evertyhing
11.18
No
-
-
-
18
GRIT (zero-shot, no CBS, no VL pretraining, single model)
11.1
No
GRIT: Faster and Better Image captioning Transfo...
2022-07-20
Code
19
RCAL
10.68
No
-
-
-
20
vinvl_yuan_cbs
10.57
No
-
-
-
21
UpDown-C
10.15
No
-
-
-
22
Xinyi
10.05
No
-
-
-
23
cxy_nocaps_training
10.01
No
-
-
-
24
camel XE
9.9
No
-
-
-
25
UpDown + ELMo + CBS
9.74
No
-
-
-
26
Oscar
9.72
No
-
-
-
27
ClipCap (MLP + GPT2 tuning)
9.7
No
ClipCap: CLIP Prefix for Image Captioning
2021-11-18
Code
28
ClipCap (Transformer)
9.57
No
ClipCap: CLIP Prefix for Image Captioning
2021-11-18
Code
29
Check
9.39
No
-
-
-
30
7_10-7_40000_predict_test.json
9.35
No
-
-
-
31
Neural Baby Talk + CBS
8.77
No
-
-
-
32
Neural Baby Talk
8.2
No
-
-
-
33
nocaps_training
8.08
No
-
-
-
34
UpDown
8.08
No
-
-
-
35
area_attention
7.72
No
-
-
-
36
Yu-Wu
7.62
No
-
-
-
37
B2
7.61
No
-
-
-
38
YX
7.52
No
-
-
-
39
coco_all_19
7.4
No
-
-
-
40
CS395T
7.2
No
-
-
-
#1
GIT, Single Model
SOTA
15.7
SPICE
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#2
GIT2, Single Model
15.62
SPICE
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#3
PaLI
15.49
SPICE
· 2022-09-14
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Code
#4
CoCa - Google Brain
15.13
SPICE
No paper
#5
FudanFVL
14.21
SPICE
No paper
#6
Human
14.21
SPICE
No paper
#7
Single Model
SOTA
13.89
SPICE
· 2021-08-24
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Code
#8
firethehole
13.87
SPICE
No paper
#9
FudanWYZ
13.75
SPICE
No paper
#10
Microsoft Cognitive Services team
SOTA
13.74
SPICE
· 2020-09-28
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning
#11
IEDA-LAB
12.52
SPICE
No paper
#12
vll@mk514
12.14
SPICE
No paper
#13
MD
11.59
SPICE
No paper
#14
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
11.53
SPICE
No paper
#15
VinVL (Microsoft Cognitive Services + MSR)
11.48
SPICE
· 2021-01-02
VinVL: Revisiting Visual Representations in Vision-Language Models
Code
#16
icgp2ssi1_coco_si_0.02_5_test
11.43
SPICE
No paper
#17
evertyhing
11.18
SPICE
No paper
#18
GRIT (zero-shot, no CBS, no VL pretraining, single model)
11.1
SPICE
· 2022-07-20
GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features
Code
#19
RCAL
10.68
SPICE
No paper
#20
vinvl_yuan_cbs
10.57
SPICE
No paper
#21
UpDown-C
10.15
SPICE
No paper
#22
Xinyi
10.05
SPICE
No paper
#23
cxy_nocaps_training
10.01
SPICE
No paper
#24
camel XE
9.9
SPICE
No paper
#25
UpDown + ELMo + CBS
9.74
SPICE
No paper
#26
Oscar
9.72
SPICE
No paper
#27
ClipCap (MLP + GPT2 tuning)
9.7
SPICE
· 2021-11-18
ClipCap: CLIP Prefix for Image Captioning
Code
#28
ClipCap (Transformer)
9.57
SPICE
· 2021-11-18
ClipCap: CLIP Prefix for Image Captioning
Code
#29
Check
9.39
SPICE
No paper
#30
7_10-7_40000_predict_test.json
9.35
SPICE
No paper
#31
Neural Baby Talk + CBS
8.77
SPICE
No paper
#32
Neural Baby Talk
8.2
SPICE
No paper
#33
nocaps_training
8.08
SPICE
No paper
#34
UpDown
8.08
SPICE
No paper
#35
area_attention
7.72
SPICE
No paper
#36
Yu-Wu
7.62
SPICE
No paper
#37
B2
7.61
SPICE
No paper
#38
YX
7.52
SPICE
No paper
#39
coco_all_19
7.4
SPICE
No paper
#40
CS395T
7.2
SPICE
No paper