Metric: CIDEr (higher is better)
| # | Model↕ | CIDEr▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | 3D CoCa | 52.84 | No | 3D CoCa: Contrastive Learners are 3D Captioners | 2025-04-13 | Code |
| 2 | BiCA | 48.77 | No | Bi-directional Contextual Attention for 3D Dense... | 2024-08-13 | - |
| 3 | Vote2Cap-DETR++ | 47.08 | No | Vote2Cap-DETR++: Decoupling Localization and Des... | 2023-09-06 | Code |
| 4 | Vote2Cap-DETR | 43.84 | No | End-to-End 3D Dense Captioning with Vote2Cap-DETR | 2023-01-06 | Code |
| 5 | 3DJCG | 38.06 | No | - | - | - |
| 6 | Contextual | 35.26 | No | Contextual Modeling for 3D Dense Captioning on P... | 2022-10-08 | - |
| 7 | REMAN | 34.81 | No | - | - | - |
| 8 | D3Net | 33.85 | No | D3Net: A Unified Speaker-Listener Architecture f... | 2021-12-02 | - |
| 9 | SpaCap3d | 33.71 | No | Spatiality-guided Transformer for 3D Dense Capti... | 2022-04-22 | Code |
| 10 | Scan2Cap | 27.47 | No | Scan2Cap: Context-aware Dense Captioning in RGB-... | 2020-12-03 | - |