TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Temporal Tessellation: A Unified Approach for Video Analysis

Temporal Tessellation: A Unified Approach for Video Analysis

Dotan Kaufman, Gil Levi, Tal Hassner, Lior Wolf

2016-12-21ICCV 2017 10Action DetectionVideo SummarizationVideo CaptioningVideo Understanding
PaperPDFCode(official)

Abstract

We present a general approach to video understanding, inspired by semantic transfer techniques that have been successfully used for 2D image analysis. Our method considers a video to be a 1D sequence of clips, each one associated with its own semantics. The nature of these semantics -- natural language captions or other labels -- depends on the task at hand. A test video is processed by forming correspondences between its clips and the clips of reference videos with known semantics, following which, reference semantics can be transferred to the test video. We describe two matching methods, both designed to ensure that (a) reference clips appear similar to test clips and (b), taken together, the semantics of the selected reference clips is consistent and maintains temporal coherence. We use our method for video captioning on the LSMDC'16 benchmark, video summarization on the SumMe and TVSum benchmarks, Temporal Action Detection on the Thumos2014 benchmark, and sound prediction on the Greatest Hits benchmark. Our method not only surpasses the state of the art, in four out of five benchmarks, but importantly, it is the only single method we know of that was successfully applied to such a diverse range of tasks.

Results

TaskDatasetMetricValueModel
VideoMSR-VTTtext-to-video Median Rank41Kaufman
VideoMSR-VTTtext-to-video R@14.7Kaufman
VideoMSR-VTTtext-to-video R@1024.1Kaufman
VideoMSR-VTTvideo-to-text R@516.6Kaufman
Video RetrievalMSR-VTTtext-to-video Median Rank41Kaufman
Video RetrievalMSR-VTTtext-to-video R@14.7Kaufman
Video RetrievalMSR-VTTtext-to-video R@1024.1Kaufman
Video RetrievalMSR-VTTvideo-to-text R@516.6Kaufman

Related Papers

VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding2025-07-17UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks2025-07-15EmbRACE-3K: Embodied Reasoning and Action in Complex Environments2025-07-14Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI2025-07-14Beyond Appearance: Geometric Cues for Robust Video Instance Segmentation2025-07-08Omni-Video: Democratizing Unified Video Understanding and Generation2025-07-08MCAM: Multimodal Causal Analysis Model for Ego-Vehicle-Level Driving Video Understanding2025-07-08Video Event Reasoning and Prediction by Fusing World Knowledge from LLMs with Vision Foundation Models2025-07-08