Social Scene Understanding: End-to-End Multi-Person Action Localization and Collective Activity Recognition

Timur Bagautdinov, Alexandre Alahi, François Fleuret, Pascal Fua, Silvio Savarese

2016-11-28CVPR 2017 7Action Localization Scene Understanding Action Recognition Activity Recognition

Abstract

We present a unified framework for understanding human social behaviors in raw image sequences. Our model jointly detects multiple individuals, infers their social actions, and estimates the collective actions with a single feed-forward pass through a neural network. We propose a single architecture that does not rely on external detection algorithms but rather is trained end-to-end to generate dense proposal maps that are refined via a novel inference scheme. The temporal consistency is handled via a person-level matching Recurrent Neural Network. The complete model takes as input a sequence of frames and outputs detections along with the estimates of individual actions and collective activities. We demonstrate state-of-the-art performance of our algorithm on multiple publicly available benchmarks.

Results

Task	Dataset	Metric	Value	Model
Activity Recognition	Volleyball	Accuracy	82.6	GTT (VGG19)
Activity Recognition	Volleyball	Accuracy	81.8	SSU (GT)
Action Recognition	Volleyball	Accuracy	82.6	GTT (VGG19)
Action Recognition	Volleyball	Accuracy	81.8	SSU (GT)

Related Papers

Advancing Complex Wide-Area Scene Understanding with Hierarchical Coresets Selection2025-07-17 Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models2025-07-17 City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17 A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17 Learning to Tune Like an Expert: Interpretable and Scene-Aware Navigation via MLLM Reasoning and CVAE-Based Adaptation2025-07-15 Tactical Decision for Multi-UGV Confrontation with a Vision-Language Model-Based Commander2025-07-15 Seeing the Signs: A Survey of Edge-Deployable OCR Models for Billboard Visibility Analysis2025-07-15 ZKP-FedEval: Verifiable and Privacy-Preserving Federated Evaluation using Zero-Knowledge Proofs2025-07-15