SwinSF: Image Reconstruction from Spatial-Temporal Spike Streams

Liangyan Jiang, Chuang Zhu, Yanxu Chen

2024-07-22Image Reconstruction

Abstract

The spike camera, with its high temporal resolution, low latency, and high dynamic range, addresses high-speed imaging challenges like motion blur. It captures photons at each pixel independently, creating binary spike streams rich in temporal information but challenging for image reconstruction. Current algorithms, both traditional and deep learning-based, still need to be improved in the utilization of the rich temporal detail and the restoration of the details of the reconstructed image. To overcome this, we introduce Swin Spikeformer (SwinSF), a novel model for dynamic scene reconstruction from spike streams. SwinSF is composed of Spike Feature Extraction, Spatial-Temporal Feature Extraction, and Final Reconstruction Module. It combines shifted window self-attention and proposed temporal spike attention, ensuring a comprehensive feature extraction that encapsulates both spatial and temporal dynamics, leading to a more robust and accurate reconstruction of spike streams. Furthermore, we build a new synthesized dataset for spike image reconstruction which matches the resolution of the latest spike camera, ensuring its relevance and applicability to the latest developments in spike camera imaging. Experimental results demonstrate that the proposed network SwinSF sets a new benchmark, achieving state-of-the-art performance across a series of datasets, including both real-world and synthesized data across various resolutions. Our codes and proposed dataset will be available soon.

Results

Task	Dataset	Metric	Value	Model
Image Reconstruction	Spike-X4K	Average PSNR	39.61	SwinSF

Related Papers

The model is the message: Lightweight convolutional autoencoders applied to noisy imaging data for planetary science and astrobiology2025-07-15 3D Magnetic Inverse Routine for Single-Segment Magnetic Field Images2025-07-15 MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization2025-07-14 Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation2025-07-11 LangMamba: A Language-driven Mamba Framework for Low-dose CT Denoising with Vision-language Models2025-07-08 Vision Transformer-Based Time-Series Image Reconstruction for Cloud-Filling Applications2025-06-24 Cloud-Aware SAR Fusion for Enhanced Optical Sensing in Space Missions2025-06-22 Client Selection Strategies for Federated Semantic Communications in Heterogeneous IoT Networks2025-06-20