Gimme Signals: Discriminative signal encoding for multimodal activity recognition

Raphael Memmesheimer, Nick Theisen, Dietrich Paulus

2020-03-13Skeleton Based Action Recognition Multimodal Activity Recognition Action Recognition Activity Recognition

Abstract

We present a simple, yet effective and flexible method for action recognition supporting multiple sensor modalities. Multivariate signal sequences are encoded in an image and are then classified using a recently proposed EfficientNet CNN architecture. Our focus was to find an approach that generalizes well across different sensor modalities without specific adaptions while still achieving good results. We apply our method to 4 action recognition datasets containing skeleton sequences, inertial and motion capturing measurements as well as \wifi fingerprints that range up to 120 action classes. Our method defines the current best CNN-based approach on the NTU RGB+D 120 dataset, lifts the state of the art on the ARIL Wi-Fi dataset by +6.78%, improves the UTD-MHAD inertial baseline by +14.4%, the UTD-MHAD skeleton baseline by 1.13% and achieves 96.11% on the Simitate motion capturing data (80/20 split). We further demonstrate experiments on both, modality fusion on a signal level and signal reduction to prevent the representation from overloading.

Results

Task	Dataset	Metric	Value	Model
Activity Recognition	NTU RGB+D 120	Accuracy (Cross-Setup)	70.8	Gimme Signals (AIS)
Activity Recognition	NTU RGB+D 120	Accuracy (Cross-Subject)	71.59	Gimme Signals (AIS)
Activity Recognition	UTD-MHAD	Accuracy (CS)	93.33	Gimme Signals (Skeleton, AIS)
Action Recognition	NTU RGB+D 120	Accuracy (Cross-Setup)	70.8	Gimme Signals (AIS)
Action Recognition	NTU RGB+D 120	Accuracy (Cross-Subject)	71.59	Gimme Signals (AIS)

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17 ZKP-FedEval: Verifiable and Privacy-Preserving Federated Evaluation using Zero-Knowledge Proofs2025-07-15 Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01 EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26 Feature Hallucination for Self-supervised Action Recognition2025-06-25 CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition2025-06-25 SEZ-HARN: Self-Explainable Zero-shot Human Activity Recognition Network2025-06-25 Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23