EmbraceNet for Activity: A Deep Multimodal Fusion Architecture for Activity Recognition
Jun-Ho Choi, Jong-Seok Lee
Abstract
Human activity recognition using multiple sensors is a challenging but promising task in recent decades. In this paper, we propose a deep multimodal fusion model for activity recognition based on the recently proposed feature fusion architecture named EmbraceNet. Our model processes each sensor data independently, combines the features with the EmbraceNet architecture, and post-processes the fused feature to predict the activity. In addition, we propose additional processes to boost the performance of our model. We submit the results obtained from our proposed model to the SHL recognition challenge with the team name "Yonsei-MCML."
Related Papers
ZKP-FedEval: Verifiable and Privacy-Preserving Federated Evaluation using Zero-Knowledge Proofs2025-07-15SEZ-HARN: Self-Explainable Zero-shot Human Activity Recognition Network2025-06-25Efficient Retail Video Annotation: A Robust Key Frame Generation Approach for Product and Customer Interaction Analysis2025-06-17DeSPITE: Exploring Contrastive Deep Skeleton-Pointcloud-IMU-Text Embeddings for Advanced Point Cloud Human Activity Understanding2025-06-16MORIC: CSI Delay-Doppler Decomposition for Robust Wi-Fi-based Human Activity Recognition2025-06-15AgentSense: Virtual Sensor Data Generation Using LLM Agents in Simulated Home Environments2025-06-13ScalableHD: Scalable and High-Throughput Hyperdimensional Computing Inference on Multi-Core CPUs2025-06-10Scaling Human Activity Recognition: A Comparative Evaluation of Synthetic Data Generation and Augmentation Techniques2025-06-09