State Space Models for Event Cameras

Nikola Zubić, Mathias Gehrig, Davide Scaramuzza

2024-02-23CVPR 2024 1Event-based vision Object Detection

Abstract

Today, state-of-the-art deep neural networks that process event-camera data first convert a temporal window of events into dense, grid-like input representations. As such, they exhibit poor generalizability when deployed at higher inference frequencies (i.e., smaller temporal windows) than the ones they were trained on. We address this challenge by introducing state-space models (SSMs) with learnable timescale parameters to event-based vision. This design adapts to varying frequencies without the need to retrain the network at different frequencies. Additionally, we investigate two strategies to counteract aliasing effects when deploying the model at higher frequencies. We comprehensively evaluate our approach against existing methods based on RNN and Transformer architectures across various benchmarks, including Gen1 and 1 Mpx event camera datasets. Our results demonstrate that SSM-based models train 33% faster and also exhibit minimal performance degradation when tested at higher frequencies than the training input. Traditional RNN and Transformer models exhibit performance drops of more than 20 mAP, with SSMs having a drop of 3.76 mAP, highlighting the effectiveness of SSMs in event-based vision tasks.

Results

Task	Dataset	Metric	Value	Model
Object Detection	GEN1 Detection	Params	18.2	S5-ViT-B
Object Detection	GEN1 Detection	mAP	47.4	S5-ViT-B
Object Detection	GEN1 Detection	Params	16.5	S4D-ViT-B
Object Detection	GEN1 Detection	mAP	46.2	S4D-ViT-B
3D	GEN1 Detection	Params	18.2	S5-ViT-B
3D	GEN1 Detection	mAP	47.4	S5-ViT-B
3D	GEN1 Detection	Params	16.5	S4D-ViT-B
3D	GEN1 Detection	mAP	46.2	S4D-ViT-B
2D Classification	GEN1 Detection	Params	18.2	S5-ViT-B
2D Classification	GEN1 Detection	mAP	47.4	S5-ViT-B
2D Classification	GEN1 Detection	Params	16.5	S4D-ViT-B
2D Classification	GEN1 Detection	mAP	46.2	S4D-ViT-B
2D Object Detection	GEN1 Detection	Params	18.2	S5-ViT-B
2D Object Detection	GEN1 Detection	mAP	47.4	S5-ViT-B
2D Object Detection	GEN1 Detection	Params	16.5	S4D-ViT-B
2D Object Detection	GEN1 Detection	mAP	46.2	S4D-ViT-B
16k	GEN1 Detection	Params	18.2	S5-ViT-B
16k	GEN1 Detection	mAP	47.4	S5-ViT-B
16k	GEN1 Detection	Params	16.5	S4D-ViT-B
16k	GEN1 Detection	mAP	46.2	S4D-ViT-B

State Space Models for Event Cameras

Abstract

Results

Related Papers

State Space Models for Event Cameras

Abstract

Results

Related Papers