Abhijay Ghildyal, Feng Liu
Existing perceptual similarity metrics assume an image and its reference are well aligned. As a result, these metrics are often sensitive to a small alignment error that is imperceptible to the human eyes. This paper studies the effect of small misalignment, specifically a small shift between the input and reference image, on existing metrics, and accordingly develops a shift-tolerant similarity metric. This paper builds upon LPIPS, a widely used learned perceptual similarity metric, and explores architectural design considerations to make it robust against imperceptible misalignment. Specifically, we study a wide spectrum of neural network elements, such as anti-aliasing filtering, pooling, striding, padding, and skip connection, and discuss their roles in making a robust metric. Based on our studies, we develop a new deep neural network-based perceptual similarity metric. Our experiments show that our metric is tolerant to imperceptible shifts while being consistent with the human similarity judgment.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Video Understanding | MSU SR-QA Dataset | KLCC | 0.45898 | ST-LPIPS (VGG) |
| Video Understanding | MSU SR-QA Dataset | PLCC | 0.56431 | ST-LPIPS (VGG) |
| Video Understanding | MSU SR-QA Dataset | SROCC | 0.57336 | ST-LPIPS (VGG) |
| Video Understanding | MSU SR-QA Dataset | KLCC | 0.42897 | ST-LPIPS (Alex) |
| Video Understanding | MSU SR-QA Dataset | PLCC | 0.5474 | ST-LPIPS (Alex) |
| Video Understanding | MSU SR-QA Dataset | SROCC | 0.53473 | ST-LPIPS (Alex) |
| Video Quality Assessment | MSU SR-QA Dataset | KLCC | 0.45898 | ST-LPIPS (VGG) |
| Video Quality Assessment | MSU SR-QA Dataset | PLCC | 0.56431 | ST-LPIPS (VGG) |
| Video Quality Assessment | MSU SR-QA Dataset | SROCC | 0.57336 | ST-LPIPS (VGG) |
| Video Quality Assessment | MSU SR-QA Dataset | KLCC | 0.42897 | ST-LPIPS (Alex) |
| Video Quality Assessment | MSU SR-QA Dataset | PLCC | 0.5474 | ST-LPIPS (Alex) |
| Video Quality Assessment | MSU SR-QA Dataset | SROCC | 0.53473 | ST-LPIPS (Alex) |
| Video | MSU SR-QA Dataset | KLCC | 0.45898 | ST-LPIPS (VGG) |
| Video | MSU SR-QA Dataset | PLCC | 0.56431 | ST-LPIPS (VGG) |
| Video | MSU SR-QA Dataset | SROCC | 0.57336 | ST-LPIPS (VGG) |
| Video | MSU SR-QA Dataset | KLCC | 0.42897 | ST-LPIPS (Alex) |
| Video | MSU SR-QA Dataset | PLCC | 0.5474 | ST-LPIPS (Alex) |
| Video | MSU SR-QA Dataset | SROCC | 0.53473 | ST-LPIPS (Alex) |