TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Learning Long-Term Spatial-Temporal Graphs for Active Spea...

Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection

Kyle Min, Sourya Roy, Subarna Tripathi, Tanaya Guha, Somdeb Majumdar

2022-07-15Audio-Visual Active Speaker DetectionGraph LearningNode ClassificationActive Speaker Detection
PaperPDFCode(official)Code(official)

Abstract

Active speaker detection (ASD) in videos with multiple speakers is a challenging task as it requires learning effective audiovisual features and spatial-temporal correlations over long temporal windows. In this paper, we present SPELL, a novel spatial-temporal graph learning framework that can solve complex tasks such as ASD. To this end, each person in a video frame is first encoded in a unique node for that frame. Nodes corresponding to a single person across frames are connected to encode their temporal dynamics. Nodes within a frame are also connected to encode inter-person relationships. Thus, SPELL reduces ASD to a node classification task. Importantly, SPELL is able to reason over long temporal contexts for all nodes without relying on computationally expensive fully connected graph neural networks. Through extensive experiments on the AVA-ActiveSpeaker dataset, we demonstrate that learning graph-based representations can significantly improve the active speaker detection performance owing to its explicit spatial and temporal structure. SPELL outperforms all previous state-of-the-art approaches while requiring significantly lower memory and computational resources. Our code is publicly available at https://github.com/SRA2/SPELL

Results

TaskDatasetMetricValueModel
Node ClassificationAVAmAP93.5ASDNet [ASDNet_ICCV2021]
Node ClassificationAVAmAP92.3TalkNet [tao2021someone]
Node ClassificationAVAmAP92UniCon [zhang2021unicon]
Node ClassificationAVAmAP88.8MAAS-TAN [MAAS2021]

Related Papers

SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17A Graph-in-Graph Learning Framework for Drug-Target Interaction Prediction2025-07-15Graph World Model2025-07-14Federated Learning with Graph-Based Aggregation for Traffic Forecasting2025-07-13Graph Learning2025-07-08GDGB: A Benchmark for Generative Dynamic Text-Attributed Graph Learning2025-07-04S2FGL: Spatial Spectral Federated Graph Learning2025-07-03Interpretable Hierarchical Concept Reasoning through Attention-Guided Graph Learning2025-06-26