Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Image Captioning
/
nocaps entire
Image Captioning on nocaps entire
Metric: SPICE (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
SPICE (best first)
SPICE (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
SPICE
▼
Extra Data
Paper
Date
↕
Code
1
GIT, Single Model
15.94
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
2
CoCa - Google Brain
15.47
No
-
-
-
3
Prismer
14.91
No
Prismer: A Vision-Language Model with Multi-Task...
2023-03-04
Code
4
Microsoft Cognitive Services team
14.85
No
Scaling Up Vision-Language Pre-training for Imag...
2021-11-24
-
5
firethehole
14.74
No
-
-
-
6
FudanFVL
14.72
No
-
-
-
7
Human
14.67
No
-
-
-
8
FudanWYZ
14.56
No
-
-
-
9
Single Model
14.49
No
SimVLM: Simple Visual Language Model Pretraining...
2021-08-24
Code
10
vll@mk514
14.06
No
-
-
-
11
IEDA-LAB
13.9
No
-
-
-
12
MD
13.35
No
-
-
-
13
VinVL (Microsoft Cognitive Services + MSR)
13.07
No
VinVL: Revisiting Visual Representations in Visi...
2021-01-02
Code
14
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
12.81
No
-
-
-
15
RCAL
12.2
No
-
-
-
16
evertyhing
12.1
No
-
-
-
17
icgp2ssi1_coco_si_0.02_5_test
12.01
No
-
-
-
18
vinvl_yuan_cbs
11.9
No
-
-
-
19
camel XE
11.89
No
-
-
-
20
MQ-UpDown-C
11.68
No
-
-
-
21
Xinyi
11.62
No
-
-
-
22
cxy_nocaps_training
11.57
No
-
-
-
23
Oscar
11.29
No
-
-
-
24
UpDown + ELMo + CBS
11.2
No
-
-
-
25
ClipCap (MLP + GPT2 tuning)
11.1
No
ClipCap: CLIP Prefix for Image Captioning
2021-11-18
Code
26
7_10-7_40000_predict_test.json
10.96
No
-
-
-
27
ClipCap (Transformer)
10.86
No
ClipCap: CLIP Prefix for Image Captioning
2021-11-18
Code
28
nocaps_training
10.14
No
-
-
-
29
UpDown
10.14
No
-
-
-
30
None
10.1
No
-
-
-
31
Neural Baby Talk + CBS
9.69
No
-
-
-
32
area_attention
9.56
No
-
-
-
33
YX
9.54
No
-
-
-
34
B2
9.42
No
-
-
-
35
Neural Baby Talk
9.15
No
-
-
-
36
coco_all_19
9.13
No
-
-
-
37
Yu-Wu
8.35
No
-
-
-
38
CS395T
8.2
No
-
-
-
#1
GIT, Single Model
SOTA
15.94
SPICE
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#2
CoCa - Google Brain
15.47
SPICE
No paper
#3
Prismer
14.91
SPICE
· 2023-03-04
Prismer: A Vision-Language Model with Multi-Task Experts
Code
#4
Microsoft Cognitive Services team
SOTA
14.85
SPICE
· 2021-11-24
Scaling Up Vision-Language Pre-training for Image Captioning
#5
firethehole
14.74
SPICE
No paper
#6
FudanFVL
14.72
SPICE
No paper
#7
Human
14.67
SPICE
No paper
#8
FudanWYZ
14.56
SPICE
No paper
#9
Single Model
SOTA
14.49
SPICE
· 2021-08-24
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Code
#10
vll@mk514
14.06
SPICE
No paper
#11
IEDA-LAB
13.9
SPICE
No paper
#12
MD
13.35
SPICE
No paper
#13
VinVL (Microsoft Cognitive Services + MSR)
SOTA
13.07
SPICE
· 2021-01-02
VinVL: Revisiting Visual Representations in Vision-Language Models
Code
#14
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
12.81
SPICE
No paper
#15
RCAL
12.2
SPICE
No paper
#16
evertyhing
12.1
SPICE
No paper
#17
icgp2ssi1_coco_si_0.02_5_test
12.01
SPICE
No paper
#18
vinvl_yuan_cbs
11.9
SPICE
No paper
#19
camel XE
11.89
SPICE
No paper
#20
MQ-UpDown-C
11.68
SPICE
No paper
#21
Xinyi
11.62
SPICE
No paper
#22
cxy_nocaps_training
11.57
SPICE
No paper
#23
Oscar
11.29
SPICE
No paper
#24
UpDown + ELMo + CBS
11.2
SPICE
No paper
#25
ClipCap (MLP + GPT2 tuning)
11.1
SPICE
· 2021-11-18
ClipCap: CLIP Prefix for Image Captioning
Code
#26
7_10-7_40000_predict_test.json
10.96
SPICE
No paper
#27
ClipCap (Transformer)
10.86
SPICE
· 2021-11-18
ClipCap: CLIP Prefix for Image Captioning
Code
#28
nocaps_training
10.14
SPICE
No paper
#29
UpDown
10.14
SPICE
No paper
#30
None
10.1
SPICE
No paper
#31
Neural Baby Talk + CBS
9.69
SPICE
No paper
#32
area_attention
9.56
SPICE
No paper
#33
YX
9.54
SPICE
No paper
#34
B2
9.42
SPICE
No paper
#35
Neural Baby Talk
9.15
SPICE
No paper
#36
coco_all_19
9.13
SPICE
No paper
#37
Yu-Wu
8.35
SPICE
No paper
#38
CS395T
8.2
SPICE
No paper