Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Image Captioning
/
nocaps entire
Image Captioning on nocaps entire
Metric: CIDEr (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
CIDEr (best first)
CIDEr (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
CIDEr
▼
Extra Data
Paper
Date
↕
Code
1
Lyrics
126.8
No
Lyrics: Boosting Fine-grained Language-Vision Al...
2023-12-08
-
2
GIT, Single Model
123.39
No
GIT: A Generative Image-to-text Transformer for ...
2022-05-27
Code
3
CoCa - Google Brain
120.55
No
-
-
-
4
Microsoft Cognitive Services team
114.25
No
Scaling Up Vision-Language Pre-training for Imag...
2021-11-24
-
5
Prismer
110.84
No
Prismer: A Vision-Language Model with Multi-Task...
2023-03-04
Code
6
Single Model
110.31
No
SimVLM: Simple Visual Language Model Pretraining...
2021-08-24
Code
7
FudanFVL
108.29
No
-
-
-
8
FudanWYZ
106.81
No
-
-
-
9
IEDA-LAB
98.08
No
-
-
-
10
firethehole
97.61
No
-
-
-
11
vll@mk514
93.45
No
-
-
-
12
MD
93
No
-
-
-
13
VinVL (Microsoft Cognitive Services + MSR)
92.46
No
VinVL: Revisiting Visual Representations in Visi...
2021-01-02
Code
14
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
87.56
No
-
-
-
15
icgp2ssi1_coco_si_0.02_5_test
87.34
No
-
-
-
16
evertyhing
86
No
-
-
-
17
Human
85.34
No
-
-
-
18
RCAL
82.88
No
-
-
-
19
Oscar
80.93
No
-
-
-
20
vinvl_yuan_cbs
79.04
No
-
-
-
21
cxy_nocaps_training
78.48
No
-
-
-
22
Xinyi
78.23
No
-
-
-
23
camel XE
75.88
No
-
-
-
24
MQ-UpDown-C
75.58
No
-
-
-
25
UpDown + ELMo + CBS
73.09
No
-
-
-
26
ClipCap (Transformer)
65.83
No
ClipCap: CLIP Prefix for Image Captioning
2021-11-18
Code
27
ClipCap (MLP + GPT2 tuning)
65.7
No
ClipCap: CLIP Prefix for Image Captioning
2021-11-18
Code
28
Neural Baby Talk + CBS
61.48
No
-
-
-
29
7_10-7_40000_predict_test.json
61.48
No
-
-
-
30
None
55.97
No
-
-
-
31
nocaps_training
54.25
No
-
-
-
32
UpDown
54.25
No
-
-
-
33
Neural Baby Talk
53.36
No
-
-
-
34
YX
49.02
No
-
-
-
35
area_attention
48.29
No
-
-
-
36
B2
47.69
No
-
-
-
37
Yu-Wu
46.18
No
-
-
-
38
coco_all_19
45.27
No
-
-
-
39
CS395T
39.33
No
-
-
-
#1
Lyrics
SOTA
126.8
CIDEr
· 2023-12-08
Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects
#2
GIT, Single Model
SOTA
123.39
CIDEr
· 2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language
Code
#3
CoCa - Google Brain
120.55
CIDEr
No paper
#4
Microsoft Cognitive Services team
SOTA
114.25
CIDEr
· 2021-11-24
Scaling Up Vision-Language Pre-training for Image Captioning
#5
Prismer
110.84
CIDEr
· 2023-03-04
Prismer: A Vision-Language Model with Multi-Task Experts
Code
#6
Single Model
SOTA
110.31
CIDEr
· 2021-08-24
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Code
#7
FudanFVL
108.29
CIDEr
No paper
#8
FudanWYZ
106.81
CIDEr
No paper
#9
IEDA-LAB
98.08
CIDEr
No paper
#10
firethehole
97.61
CIDEr
No paper
#11
vll@mk514
93.45
CIDEr
No paper
#12
MD
93
CIDEr
No paper
#13
VinVL (Microsoft Cognitive Services + MSR)
SOTA
92.46
CIDEr
· 2021-01-02
VinVL: Revisiting Visual Representations in Vision-Language Models
Code
#14
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
87.56
CIDEr
No paper
#15
icgp2ssi1_coco_si_0.02_5_test
87.34
CIDEr
No paper
#16
evertyhing
86
CIDEr
No paper
#17
Human
85.34
CIDEr
No paper
#18
RCAL
82.88
CIDEr
No paper
#19
Oscar
80.93
CIDEr
No paper
#20
vinvl_yuan_cbs
79.04
CIDEr
No paper
#21
cxy_nocaps_training
78.48
CIDEr
No paper
#22
Xinyi
78.23
CIDEr
No paper
#23
camel XE
75.88
CIDEr
No paper
#24
MQ-UpDown-C
75.58
CIDEr
No paper
#25
UpDown + ELMo + CBS
73.09
CIDEr
No paper
#26
ClipCap (Transformer)
65.83
CIDEr
· 2021-11-18
ClipCap: CLIP Prefix for Image Captioning
Code
#27
ClipCap (MLP + GPT2 tuning)
65.7
CIDEr
· 2021-11-18
ClipCap: CLIP Prefix for Image Captioning
Code
#28
Neural Baby Talk + CBS
61.48
CIDEr
No paper
#29
7_10-7_40000_predict_test.json
61.48
CIDEr
No paper
#30
None
55.97
CIDEr
No paper
#31
nocaps_training
54.25
CIDEr
No paper
#32
UpDown
54.25
CIDEr
No paper
#33
Neural Baby Talk
53.36
CIDEr
No paper
#34
YX
49.02
CIDEr
No paper
#35
area_attention
48.29
CIDEr
No paper
#36
B2
47.69
CIDEr
No paper
#37
Yu-Wu
46.18
CIDEr
No paper
#38
coco_all_19
45.27
CIDEr
No paper
#39
CS395T
39.33
CIDEr
No paper