Metric: BLEU-4 (higher is better)
| # | Model↕ | BLEU-4▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | 3D CoCa | 45.56 | No | 3D CoCa: Contrastive Learners are 3D Captioners | 2025-04-13 | Code |
| 2 | See It All | 42.17 | No | See It All: Contextualized Late Aggregation for ... | 2024-08-14 | - |
| 3 | Vote2Cap-DETR++ | 41.37 | No | Vote2Cap-DETR++: Decoupling Localization and Des... | 2023-09-06 | Code |
| 4 | BiCA | 40.16 | No | Bi-directional Contextual Attention for 3D Dense... | 2024-08-13 | - |
| 5 | 3DJCG | 39.67 | No | - | - | - |
| 6 | Vote2Cap-DETR | 39.34 | No | End-to-End 3D Dense Captioning with Vote2Cap-DETR | 2023-01-06 | Code |
| 7 | MORE | 35.41 | No | MORE: Multi-Order RElation Mining for Dense Capt... | 2022-03-10 | Code |
| 8 | SpaCap3d | 35.3 | No | Spatiality-guided Transformer for 3D Dense Capti... | 2022-04-22 | Code |
| 9 | Scan2Cap | 34.25 | No | Scan2Cap: Context-aware Dense Captioning in RGB-... | 2020-12-03 | - |
| 10 | 3D-VLP | 31.87 | No | - | - | Code |
| 11 | Contextual | 26.64 | No | Contextual Modeling for 3D Dense Captioning on P... | 2022-10-08 | - |
| 12 | χ-Tran2Cap | 23.83 | No | X-Trans2Cap: Cross-Modal Knowledge Transfer usin... | 2022-03-02 | Code |