Image Captioning on ScanRefer Dataset

Metric: CIDEr (higher is better)

LeaderboardDataset

Loading chart...

Results

Submit a result

Sort:

#	Model↕	CIDEr▼	Extra Data	Paper	Date↕	Code
1	3D CoCa	85.42	No	3D CoCa: Contrastive Learners are 3D Captioners	2025-04-13	Code
2	See It All	83.14	No	See It All: Contextualized Late Aggregation for ...	2024-08-14	-
3	BiCA	80.14	No	Bi-directional Contextual Attention for 3D Dense...	2024-08-13	-
4	Vote2Cap-DETR++	76.36	No	Vote2Cap-DETR++: Decoupling Localization and Des...	2023-09-06	Code
5	Vote2Cap-DETR	71.45	No	End-to-End 3D Dense Captioning with Vote2Cap-DETR	2023-01-06	Code
6	3DJCG	60.86	No	-	-	-
7	MORE	58.89	No	MORE: Multi-Order RElation Mining for Dense Capt...	2022-03-10	Code
8	SpaCap3d	58.06	No	Spatiality-guided Transformer for 3D Dense Capti...	2022-04-22	Code
9	Scan2Cap	53.73	No	Scan2Cap: Context-aware Dense Captioning in RGB-...	2020-12-03	-
10	Contextual	50.29	No	Contextual Modeling for 3D Dense Captioning on P...	2022-10-08	-
11	3D-VLP	50.02	No	-	-	Code
12	χ-Tran2Cap	41.52	No	X-Trans2Cap: Cross-Modal Knowledge Transfer usin...	2022-03-02	Code

#13D CoCaSOTA
85.42
CIDEr· 2025-04-13
3D CoCa: Contrastive Learners are 3D Captioners Code
#2See It AllSOTA
83.14
CIDEr· 2024-08-14
See It All: Contextualized Late Aggregation for 3D Dense Captioning
#3BiCASOTA
80.14
CIDEr· 2024-08-13
Bi-directional Contextual Attention for 3D Dense Captioning
#4Vote2Cap-DETR++SOTA
76.36
CIDEr· 2023-09-06
Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning Code
#5Vote2Cap-DETRSOTA
71.45
CIDEr· 2023-01-06
End-to-End 3D Dense Captioning with Vote2Cap-DETR Code
#63DJCG
60.86
CIDEr
No paper
#7MORESOTA
58.89
CIDEr· 2022-03-10
MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes Code
#8SpaCap3d
58.06
CIDEr· 2022-04-22
Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds Code
#9Scan2CapSOTA
53.73
CIDEr· 2020-12-03
Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
#10Contextual
50.29
CIDEr· 2022-10-08
Contextual Modeling for 3D Dense Captioning on Point Clouds
#113D-VLP
50.02
CIDEr
No paperCode
#12χ-Tran2Cap
41.52
CIDEr· 2022-03-02
X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning Code