Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Image Captioning
/
nocaps entire
Image Captioning on nocaps entire
Metric: B1 (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
B1 (best first)
B1 (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
B1
▼
Extra Data
Paper
Date
↕
Code
1
GIT, Single Model
88.1
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
2
CoCa - Google Brain
87.01
No
-
-
-
3
Microsoft Cognitive Services team
85.62
No
Scaling Up Vision-Language Pre-training for Imag...
2021-11-24
-
4
Prismer
84.87
No
Prismer: A Vision-Language Model with Multi-Task...
2023-03-04
Code
5
FudanFVL
83.9
No
-
-
-
6
Single Model
83.78
No
SimVLM: Simple Visual Language Model Pretraining...
2021-08-24
Code
7
IEDA-LAB
83.25
No
-
-
-
8
FudanWYZ
82.95
No
-
-
-
9
MD
82.43
No
-
-
-
10
vll@mk514
81.61
No
-
-
-
11
VinVL (Microsoft Cognitive Services + MSR)
81.59
No
VinVL: Revisiting Visual Representations in Visi...
2021-01-02
Code
12
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
81.03
No
-
-
-
13
firethehole
80.77
No
-
-
-
14
Oscar
79.57
No
-
-
-
15
vinvl_yuan_cbs
79.32
No
-
-
-
16
icgp2ssi1_coco_si_0.02_5_test
79
No
-
-
-
17
evertyhing
78.92
No
-
-
-
18
cxy_nocaps_training
78.75
No
-
-
-
19
Xinyi
78.58
No
-
-
-
20
RCAL
78.19
No
-
-
-
21
camel XE
77.97
No
-
-
-
22
MQ-UpDown-C
76.89
No
-
-
-
23
Human
76.64
No
-
-
-
24
UpDown + ELMo + CBS
76.59
No
-
-
-
25
nocaps_training
74
No
-
-
-
26
UpDown
74
No
-
-
-
27
Neural Baby Talk + CBS
73.42
No
-
-
-
28
B2
73.04
No
-
-
-
29
YX
72.78
No
-
-
-
30
7_10-7_40000_predict_test.json
72.49
No
-
-
-
31
Neural Baby Talk
72.33
No
-
-
-
32
area_attention
72.02
No
-
-
-
33
None
71.69
No
-
-
-
34
coco_all_19
69.44
No
-
-
-
35
CS395T
69.07
No
-
-
-
36
Yu-Wu
67.85
No
-
-
-
#1
GIT, Single Model
SOTA
88.1
B1
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#2
CoCa - Google Brain
87.01
B1
No paper
#3
Microsoft Cognitive Services team
SOTA
85.62
B1
· 2021-11-24
Scaling Up Vision-Language Pre-training for Image Captioning
#4
Prismer
84.87
B1
· 2023-03-04
Prismer: A Vision-Language Model with Multi-Task Experts
Code
#5
FudanFVL
83.9
B1
No paper
#6
Single Model
SOTA
83.78
B1
· 2021-08-24
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Code
#7
IEDA-LAB
83.25
B1
No paper
#8
FudanWYZ
82.95
B1
No paper
#9
MD
82.43
B1
No paper
#10
vll@mk514
81.61
B1
No paper
#11
VinVL (Microsoft Cognitive Services + MSR)
SOTA
81.59
B1
· 2021-01-02
VinVL: Revisiting Visual Representations in Vision-Language Models
Code
#12
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
81.03
B1
No paper
#13
firethehole
80.77
B1
No paper
#14
Oscar
79.57
B1
No paper
#15
vinvl_yuan_cbs
79.32
B1
No paper
#16
icgp2ssi1_coco_si_0.02_5_test
79
B1
No paper
#17
evertyhing
78.92
B1
No paper
#18
cxy_nocaps_training
78.75
B1
No paper
#19
Xinyi
78.58
B1
No paper
#20
RCAL
78.19
B1
No paper
#21
camel XE
77.97
B1
No paper
#22
MQ-UpDown-C
76.89
B1
No paper
#23
Human
76.64
B1
No paper
#24
UpDown + ELMo + CBS
76.59
B1
No paper
#25
nocaps_training
74
B1
No paper
#26
UpDown
74
B1
No paper
#27
Neural Baby Talk + CBS
73.42
B1
No paper
#28
B2
73.04
B1
No paper
#29
YX
72.78
B1
No paper
#30
7_10-7_40000_predict_test.json
72.49
B1
No paper
#31
Neural Baby Talk
72.33
B1
No paper
#32
area_attention
72.02
B1
No paper
#33
None
71.69
B1
No paper
#34
coco_all_19
69.44
B1
No paper
#35
CS395T
69.07
B1
No paper
#36
Yu-Wu
67.85
B1
No paper