TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Ecsnet: Spatio-temporal feature learning for event camera

Ecsnet: Spatio-temporal feature learning for event camera

Zhiwen Chen, Jinjian Wu, Junhui Hou, Leida Li, Weisheng Dong, Guangming Shi

2022-08-29IEEE Transactions on Circuits and Systems for Video Technology 2022 8Event-based visionGesture GenerationAction RecognitionEvent data classification
PaperPDFCode

Abstract

The neuromorphic event cameras can efficiently sense the latent geometric structures and motion clues of a scene by generating asynchronous and sparse event signals. Due to the irregular layout of the event signals, how to leverage their plentiful spatio-temporal information for recognition tasks remains a significant challenge. Existing methods tend to treat events as dense image-like or point-serie representations. However, they either suffer from severe destruction on the sparsity of event data or fail to encode robust spatial cues. To fully exploit their inherent sparsity with reconciling the spatio-temporal information, we introduce a compact event representation, namely 2D-1T event cloud sequence (2D-1T ECS). We couple this representation with a novel light-weight spatiotemporal learning framework (ECSNet) that accommodates both object classification and action recognition tasks. The core of our framework is a hierarchical spatial relation module. Equipped with specially designed surface-event-based sampling unit and local event normalization unit to enhance the inter-event relation encoding, this module learns robust geometric features from the 2D event clouds. And we propose a motion attention module for efficiently capturing long-term temporal context evolving with the 1T cloud sequence. Empirically, the experiments show that our framework achieves par or even better state-of-the-art performance. Importantly, our approach cooperates well with the sparsity of event data without any sophisticated operations, hence leading to low computational costs and prominent inference speeds.

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17DeepGesture: A conversational gesture synthesis system based on emotions and semantics2025-07-03Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26Feature Hallucination for Self-supervised Action Recognition2025-06-25CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition2025-06-25Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23Adapting Vision-Language Models for Evaluating World Models2025-06-22