Image Captioning on Flickr30k Captions test

Metric: CIDEr (higher is better)

LeaderboardDataset

Loading chart...

Results

Submit a result

Hide extra data

Sort:

#	Model↕	CIDEr▼	Extra Data	Paper	Date↕	Code
1	Unified VLP	67.4	No	Unified Vision-Language Pre-Training for Image C...	2019-09-24	Code
2	KOSMOS-1 1.6B (zero-shot)	67.1	No	-	-	-
3	Cornia et al	46.4	Yes	Paying More Attention to Saliency: Image Caption...	2017-06-26	-
4	MetaLM	43.3	No	Language Models are General-Purpose Interfaces	2022-06-13	Code
5	FewVLM	31	No	A Good Prompt Is Worth Millions of Parameters: L...	2021-10-16	Code
6	BRNN	24.7	No	Deep Visual-Semantic Alignments for Generating I...	2014-12-07	Code
7	VL-T5	2.6	No	Unifying Vision-and-Language Tasks via Text Gene...	2021-02-04	Code

#1Unified VLPSOTA
67.4
CIDEr· 2019-09-24
Unified Vision-Language Pre-Training for Image Captioning and VQA Code
#2KOSMOS-1 1.6B (zero-shot)
67.1
CIDEr
No paper
#3Cornia et alSOTA
46.4
CIDEr· Extra Data· 2017-06-26
Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention
#4MetaLM
43.3
CIDEr· 2022-06-13
Language Models are General-Purpose Interfaces Code
#5FewVLM
31
CIDEr· 2021-10-16
A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models Code
#6BRNNSOTA
24.7
CIDEr· 2014-12-07
Deep Visual-Semantic Alignments for Generating Image Descriptions Code
#7VL-T5
2.6
CIDEr· 2021-02-04
Unifying Vision-and-Language Tasks via Text Generation Code