TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/HiCM$^2$: Hierarchical Compact Memory Modeling for Dense V...

HiCM$^2$: Hierarchical Compact Memory Modeling for Dense Video Captioning

Minkuk Kim, Hyeon Bae Kim, Jinyoung Moon, Jinwoo Choi, Seong Tae Kim

2024-12-19Video CaptioningDense Video Captioning
PaperPDFCode

Abstract

With the growing demand for solutions to real-world video challenges, interest in dense video captioning (DVC) has been on the rise. DVC involves the automatic captioning and localization of untrimmed videos. Several studies highlight the challenges of DVC and introduce improved methods utilizing prior knowledge, such as pre-training and external memory. In this research, we propose a model that leverages the prior knowledge of human-oriented hierarchical compact memory inspired by human memory hierarchy and cognition. To mimic human-like memory recall, we construct a hierarchical memory and a hierarchical memory reading module. We build an efficient hierarchical compact memory by employing clustering of memory events and summarization using large language models. Comparative experiments demonstrate that this hierarchical memory recall process improves the performance of DVC by achieving state-of-the-art performance on YouCook2 and ViTT datasets.

Results

TaskDatasetMetricValueModel
Video CaptioningYouCook2BLEU46.11HiCM²
Video CaptioningYouCook2CIDEr71.84HiCM²
Video CaptioningYouCook2F132.51HiCM²
Video CaptioningYouCook2METEOR12.8HiCM²
Video CaptioningYouCook2Precision32.51HiCM²
Video CaptioningYouCook2Recall32.51HiCM²
Video CaptioningYouCook2SODA10.73HiCM²
Video CaptioningViTTCIDEr51.2HiCM²
Video CaptioningViTTMETEOR9.6HiCM²
Video CaptioningViTTSODA0.15HiCM²
Dense Video CaptioningYouCook2BLEU46.11HiCM²
Dense Video CaptioningYouCook2CIDEr71.84HiCM²
Dense Video CaptioningYouCook2F132.51HiCM²
Dense Video CaptioningYouCook2METEOR12.8HiCM²
Dense Video CaptioningYouCook2Precision32.51HiCM²
Dense Video CaptioningYouCook2Recall32.51HiCM²
Dense Video CaptioningYouCook2SODA10.73HiCM²
Dense Video CaptioningViTTCIDEr51.2HiCM²
Dense Video CaptioningViTTMETEOR9.6HiCM²
Dense Video CaptioningViTTSODA0.15HiCM²

Related Papers

UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks2025-07-15Show, Tell and Summarize: Dense Video Captioning Using Visual Cue Aided Sentence Summarization2025-06-25Dense Video Captioning using Graph-based Sentence Summarization2025-06-25video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models2025-06-18VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks2025-06-10ARGUS: Hallucination and Omission Evaluation in Video-LLMs2025-06-09Temporal Object Captioning for Street Scene Videos from LiDAR Tracks2025-05-22FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks2025-05-19