Recurrence without Recurrence: Stable Video Landmark Detection with Deep Equilibrium Models

Paul Micaelli, Arash Vahdat, Hongxu Yin, Jan Kautz, Pavlo Molchanov

2023-04-02CVPR 2023 1Face Alignment

Abstract

Cascaded computation, whereby predictions are recurrently refined over several stages, has been a persistent theme throughout the development of landmark detection models. In this work, we show that the recently proposed Deep Equilibrium Model (DEQ) can be naturally adapted to this form of computation. Our Landmark DEQ (LDEQ) achieves state-of-the-art performance on the challenging WFLW facial landmark dataset, reaching $3.92$ NME with fewer parameters and a training memory cost of $\mathcal{O}(1)$ in the number of recurrent modules. Furthermore, we show that DEQs are particularly suited for landmark detection in videos. In this setting, it is typical to train on still images due to the lack of labelled videos. This can lead to a ``flickering'' effect at inference time on video, whereby a model can rapidly oscillate between different plausible solutions across consecutive frames. By rephrasing DEQs as a constrained optimization, we emulate recurrence at inference time, despite not having access to temporal data at training time. This Recurrence without Recurrence (RwR) paradigm helps in reducing landmark flicker, which we demonstrate by introducing a new metric, normalized mean flicker (NMF), and contributing a new facial landmark video dataset (WFLW-V) targeting landmark uncertainty. On the WFLW-V hard subset made up of $500$ videos, our LDEQ with RwR improves the NME and NMF by $10$ and $13\%$ respectively, compared to the strongest previously published model using a hand-tuned conventional filter.

Results

Task	Dataset	Metric	Value	Model
Facial Recognition and Modelling	WFLW	AUC@10 (inter-ocular)	62.4	LDEQ
Facial Recognition and Modelling	WFLW	FR@10 (inter-ocular)	2.48	LDEQ
Facial Recognition and Modelling	WFLW	NME (inter-ocular)	3.92	LDEQ
Face Reconstruction	WFLW	AUC@10 (inter-ocular)	62.4	LDEQ
Face Reconstruction	WFLW	FR@10 (inter-ocular)	2.48	LDEQ
Face Reconstruction	WFLW	NME (inter-ocular)	3.92	LDEQ
3D	WFLW	AUC@10 (inter-ocular)	62.4	LDEQ
3D	WFLW	FR@10 (inter-ocular)	2.48	LDEQ
3D	WFLW	NME (inter-ocular)	3.92	LDEQ
3D Face Modelling	WFLW	AUC@10 (inter-ocular)	62.4	LDEQ
3D Face Modelling	WFLW	FR@10 (inter-ocular)	2.48	LDEQ
3D Face Modelling	WFLW	NME (inter-ocular)	3.92	LDEQ
3D Face Reconstruction	WFLW	AUC@10 (inter-ocular)	62.4	LDEQ
3D Face Reconstruction	WFLW	FR@10 (inter-ocular)	2.48	LDEQ
3D Face Reconstruction	WFLW	NME (inter-ocular)	3.92	LDEQ

Abstract

Results

Task	Dataset	Metric	Value	Model
Facial Recognition and Modelling	WFLW	AUC@10 (inter-ocular)	62.4	LDEQ
Facial Recognition and Modelling	WFLW	FR@10 (inter-ocular)	2.48	LDEQ
Facial Recognition and Modelling	WFLW	NME (inter-ocular)	3.92	LDEQ
Face Reconstruction	WFLW	AUC@10 (inter-ocular)	62.4	LDEQ
Face Reconstruction	WFLW	FR@10 (inter-ocular)	2.48	LDEQ
Face Reconstruction	WFLW	NME (inter-ocular)	3.92	LDEQ
3D	WFLW	AUC@10 (inter-ocular)	62.4	LDEQ
3D	WFLW	FR@10 (inter-ocular)	2.48	LDEQ
3D	WFLW	NME (inter-ocular)	3.92	LDEQ
3D Face Modelling	WFLW	AUC@10 (inter-ocular)	62.4	LDEQ
3D Face Modelling	WFLW	FR@10 (inter-ocular)	2.48	LDEQ
3D Face Modelling	WFLW	NME (inter-ocular)	3.92	LDEQ
3D Face Reconstruction	WFLW	AUC@10 (inter-ocular)	62.4	LDEQ
3D Face Reconstruction	WFLW	FR@10 (inter-ocular)	2.48	LDEQ
3D Face Reconstruction	WFLW	NME (inter-ocular)	3.92	LDEQ

Recurrence without Recurrence: Stable Video Landmark Detection with Deep Equilibrium Models

Abstract

Results

Related Papers

Recurrence without Recurrence: Stable Video Landmark Detection with Deep Equilibrium Models

Abstract

Results

Related Papers