Metric: CIDEr (higher is better)
| # | Model↕ | CIDEr▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | 3D CoCa | 85.42 | No | 3D CoCa: Contrastive Learners are 3D Captioners | 2025-04-13 | Code |
| 2 | See It All | 83.14 | No | See It All: Contextualized Late Aggregation for ... | 2024-08-14 | - |
| 3 | BiCA | 80.14 | No | Bi-directional Contextual Attention for 3D Dense... | 2024-08-13 | - |
| 4 | Vote2Cap-DETR++ | 76.36 | No | Vote2Cap-DETR++: Decoupling Localization and Des... | 2023-09-06 | Code |
| 5 | Vote2Cap-DETR | 71.45 | No | End-to-End 3D Dense Captioning with Vote2Cap-DETR | 2023-01-06 | Code |
| 6 | 3DJCG | 60.86 | No | - | - | - |
| 7 | MORE | 58.89 | No | MORE: Multi-Order RElation Mining for Dense Capt... | 2022-03-10 | Code |
| 8 | SpaCap3d | 58.06 | No | Spatiality-guided Transformer for 3D Dense Capti... | 2022-04-22 | Code |
| 9 | Scan2Cap | 53.73 | No | Scan2Cap: Context-aware Dense Captioning in RGB-... | 2020-12-03 | - |
| 10 | Contextual | 50.29 | No | Contextual Modeling for 3D Dense Captioning on P... | 2022-10-08 | - |
| 11 | 3D-VLP | 50.02 | No | - | - | Code |
| 12 | χ-Tran2Cap | 41.52 | No | X-Trans2Cap: Cross-Modal Knowledge Transfer usin... | 2022-03-02 | Code |