Temporal Gaussian Mixture Layer for Videos

AJ Piergiovanni, Michael S. Ryoo

2018-03-16ICLR 2019 5Action Detection Activity Detection

Abstract

We introduce a new convolutional layer named the Temporal Gaussian Mixture (TGM) layer and present how it can be used to efficiently capture longer-term temporal information in continuous activity videos. The TGM layer is a temporal convolutional layer governed by a much smaller set of parameters (e.g., location/variance of Gaussians) that are fully differentiable. We present our fully convolutional video models with multiple TGM layers for activity detection. The extensive experiments on multiple datasets, including Charades and MultiTHUMOS, confirm the effectiveness of TGM layers, significantly outperforming the state-of-the-arts.

Results

Task	Dataset	Metric	Value	Model
Action Detection	Multi-THUMOS	mAP	46.4	TGM
Action Detection	Charades	mAP	22.3	TGM (RGB+Flow)

Related Papers

CBF-AFA: Chunk-Based Multi-SSL Fusion for Automatic Fluency Assessment2025-06-25 MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans2025-06-25 Distributed Activity Detection for Cell-Free Hybrid Near-Far Field Communications2025-06-17 Speaker Diarization with Overlapping Community Detection Using Graph Attention Networks and Label Propagation Algorithm2025-06-03 Attention Is Not Always the Answer: Optimizing Voice Activity Detection with Simple Feature Fusion2025-06-02 Joint Activity Detection and Channel Estimation for Massive Connectivity: Where Message Passing Meets Score-Based Generative Priors2025-05-31 Towards Robust Overlapping Speech Detection: A Speaker-Aware Progressive Approach Using WavLM2025-05-29 Robust Activity Detection for Massive Random Access2025-05-21