Image Retrieval on PhotoChat

Metric: Sum(R@1,5,10) (higher is better)

LeaderboardDataset

Loading chart...

Results

Sort:

#	Model↕	Sum(R@1,5,10)▼	Extra Data	Paper	Date↕	Code
1	PaCE	101.5	No	PaCE: Unified Multi-modal Dialogue Pre-training ...	2023-05-24	Code
2	VLMo	83.2	No	VLMo: Unified Vision-Language Pre-Training with ...	2021-11-03	Code
3	SCAN	74.5	No	Stacked Cross Attention for Image-Text Matching	2018-03-21	Code
4	DE++	71.1	No	PhotoChat: A Human-Human Dialogue Dataset with P...	2021-07-06	-
5	ViLT	71	No	ViLT: Vision-and-Language Transformer Without Co...	2021-02-05	Code

#1PaCESOTA
101.5
Sum(R@1,5,10)· 2023-05-24
PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts Code
#2VLMoSOTA
83.2
Sum(R@1,5,10)· 2021-11-03
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts Code
#3SCANSOTA
74.5
Sum(R@1,5,10)· 2018-03-21
Stacked Cross Attention for Image-Text Matching Code
#4DE++
71.1
Sum(R@1,5,10)· 2021-07-06
PhotoChat: A Human-Human Dialogue Dataset with Photo Sharing Behavior for Joint Image-Text Modeling
#5ViLT
71
Sum(R@1,5,10)· 2021-02-05
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision Code