FlowFormer: A Transformer Architecture for Optical Flow

Zhaoyang Huang, Xiaoyu Shi, Chao Zhang, Qiang Wang, Ka Chun Cheung, Hongwei Qin, Jifeng Dai, Hongsheng Li

2022-03-30Optical Flow Estimation

Abstract

We introduce optical Flow transFormer, dubbed as FlowFormer, a transformer-based neural network architecture for learning optical flow. FlowFormer tokenizes the 4D cost volume built from an image pair, encodes the cost tokens into a cost memory with alternate-group transformer (AGT) layers in a novel latent space, and decodes the cost memory via a recurrent transformer decoder with dynamic positional cost queries. On the Sintel benchmark, FlowFormer achieves 1.159 and 2.088 average end-point-error (AEPE) on the clean and final pass, a 16.5% and 15.5% error reduction from the best published result (1.388 and 2.47). Besides, FlowFormer also achieves strong generalization performance. Without being trained on Sintel, FlowFormer achieves 1.01 AEPE on the clean pass of Sintel training set, outperforming the best published result (1.29) by 21.7%.

Results

Task	Dataset	Metric	Value	Model
Optical Flow Estimation	Sintel-clean	Average End-Point Error	1.16	FlowFormer
Optical Flow Estimation	KITTI 2015 (train)	EPE	4.09	FlowFormer
Optical Flow Estimation	KITTI 2015 (train)	F1-all	14.7	FlowFormer
Optical Flow Estimation	Spring	1px total	6.51	FlowFormer

Related Papers

Channel-wise Motion Features for Efficient Motion Segmentation2025-07-17 An Efficient Approach for Muscle Segmentation and 3D Reconstruction Using Keypoint Tracking in MRI Scan2025-07-11 Learning to Track Any Points from Human Motion2025-07-08 TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation2025-07-07 MEMFOF: High-Resolution Training for Memory-Efficient Multi-Frame Optical Flow Estimation2025-06-29 EndoFlow-SLAM: Real-Time Endoscopic SLAM with Flow-Constrained Gaussian Splatting2025-06-26 WAFT: Warping-Alone Field Transforms for Optical Flow2025-06-26 Feature Hallucination for Self-supervised Action Recognition2025-06-25