Chaofeng Chen, Jiadi Mo, Jingwen Hou, HaoNing Wu, Liang Liao, Wenxiu Sun, Qiong Yan, Weisi Lin
Image Quality Assessment (IQA) is a fundamental task in computer vision that has witnessed remarkable progress with deep neural networks. Inspired by the characteristics of the human visual system, existing methods typically use a combination of global and local representations (\ie, multi-scale features) to achieve superior performance. However, most of them adopt simple linear fusion of multi-scale features, and neglect their possibly complex relationship and interaction. In contrast, humans typically first form a global impression to locate important regions and then focus on local details in those regions. We therefore propose a top-down approach that uses high-level semantics to guide the IQA network to focus on semantically important local distortion regions, named as \emph{TOPIQ}. Our approach to IQA involves the design of a heuristic coarse-to-fine network (CFANet) that leverages multi-scale features and progressively propagates multi-level semantic information to low-level representations in a top-down manner. A key component of our approach is the proposed cross-scale attention mechanism, which calculates attention maps for lower level features guided by higher level features. This mechanism emphasizes active semantic regions for low-level distortions, thereby improving performance. CFANet can be used for both Full-Reference (FR) and No-Reference (NR) IQA. We use ResNet50 as its backbone and demonstrate that CFANet achieves better or competitive performance on most public FR and NR benchmarks compared with state-of-the-art methods based on vision transformers, while being much more efficient (with only ${\sim}13\%$ FLOPS of the current best FR method). Codes are released at \url{https://github.com/chaofengc/IQA-PyTorch}.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Video Understanding | MSU SR-QA Dataset | KLCC | 0.5314 | TOPIQ trained on SPAQ (NR) |
| Video Understanding | MSU SR-QA Dataset | PLCC | 0.60905 | TOPIQ trained on SPAQ (NR) |
| Video Understanding | MSU SR-QA Dataset | SROCC | 0.64923 | TOPIQ trained on SPAQ (NR) |
| Video Understanding | MSU SR-QA Dataset | KLCC | 0.5067 | TOPIQ |
| Video Understanding | MSU SR-QA Dataset | PLCC | 0.57674 | TOPIQ |
| Video Understanding | MSU SR-QA Dataset | SROCC | 0.62715 | TOPIQ |
| Video Understanding | MSU SR-QA Dataset | KLCC | 0.48428 | TOPIQ FACE |
| Video Understanding | MSU SR-QA Dataset | PLCC | 0.58949 | TOPIQ FACE |
| Video Understanding | MSU SR-QA Dataset | SROCC | 0.59564 | TOPIQ FACE |
| Video Understanding | MSU SR-QA Dataset | KLCC | 0.46217 | TOPIQ |
| Video Understanding | MSU SR-QA Dataset | PLCC | 0.57955 | TOPIQ |
| Video Understanding | MSU SR-QA Dataset | SROCC | 0.57341 | TOPIQ |
| Video Understanding | MSU SR-QA Dataset | KLCC | 0.42811 | TOPIQ trained on PIPAL |
| Video Understanding | MSU SR-QA Dataset | PLCC | 0.57564 | TOPIQ trained on PIPAL |
| Video Understanding | MSU SR-QA Dataset | SROCC | 0.55568 | TOPIQ trained on PIPAL |
| Video Understanding | MSU SR-QA Dataset | KLCC | 0.40663 | TOPIQ (IAA) |
| Video Understanding | MSU SR-QA Dataset | PLCC | 0.51061 | TOPIQ (IAA) |
| Video Understanding | MSU SR-QA Dataset | SROCC | 0.51687 | TOPIQ (IAA) |
| Video Understanding | MSU SR-QA Dataset | KLCC | 0.28473 | TOPIQ + Res50 (IAA) |
| Video Understanding | MSU SR-QA Dataset | PLCC | 0.34 | TOPIQ + Res50 (IAA) |
| Video Understanding | MSU SR-QA Dataset | SROCC | 0.36204 | TOPIQ + Res50 (IAA) |
| Video Understanding | MSU SR-QA Dataset | KLCC | 0.26774 | TOPIQ trained on FLIVE |
| Video Understanding | MSU SR-QA Dataset | PLCC | 0.3394 | TOPIQ trained on FLIVE |
| Video Understanding | MSU SR-QA Dataset | SROCC | 0.34092 | TOPIQ trained on FLIVE |
| Video Quality Assessment | MSU SR-QA Dataset | KLCC | 0.5314 | TOPIQ trained on SPAQ (NR) |
| Video Quality Assessment | MSU SR-QA Dataset | PLCC | 0.60905 | TOPIQ trained on SPAQ (NR) |
| Video Quality Assessment | MSU SR-QA Dataset | SROCC | 0.64923 | TOPIQ trained on SPAQ (NR) |
| Video Quality Assessment | MSU SR-QA Dataset | KLCC | 0.5067 | TOPIQ |
| Video Quality Assessment | MSU SR-QA Dataset | PLCC | 0.57674 | TOPIQ |
| Video Quality Assessment | MSU SR-QA Dataset | SROCC | 0.62715 | TOPIQ |
| Video Quality Assessment | MSU SR-QA Dataset | KLCC | 0.48428 | TOPIQ FACE |
| Video Quality Assessment | MSU SR-QA Dataset | PLCC | 0.58949 | TOPIQ FACE |
| Video Quality Assessment | MSU SR-QA Dataset | SROCC | 0.59564 | TOPIQ FACE |
| Video Quality Assessment | MSU SR-QA Dataset | KLCC | 0.46217 | TOPIQ |
| Video Quality Assessment | MSU SR-QA Dataset | PLCC | 0.57955 | TOPIQ |
| Video Quality Assessment | MSU SR-QA Dataset | SROCC | 0.57341 | TOPIQ |
| Video Quality Assessment | MSU SR-QA Dataset | KLCC | 0.42811 | TOPIQ trained on PIPAL |
| Video Quality Assessment | MSU SR-QA Dataset | PLCC | 0.57564 | TOPIQ trained on PIPAL |
| Video Quality Assessment | MSU SR-QA Dataset | SROCC | 0.55568 | TOPIQ trained on PIPAL |
| Video Quality Assessment | MSU SR-QA Dataset | KLCC | 0.40663 | TOPIQ (IAA) |
| Video Quality Assessment | MSU SR-QA Dataset | PLCC | 0.51061 | TOPIQ (IAA) |
| Video Quality Assessment | MSU SR-QA Dataset | SROCC | 0.51687 | TOPIQ (IAA) |
| Video Quality Assessment | MSU SR-QA Dataset | KLCC | 0.28473 | TOPIQ + Res50 (IAA) |
| Video Quality Assessment | MSU SR-QA Dataset | PLCC | 0.34 | TOPIQ + Res50 (IAA) |
| Video Quality Assessment | MSU SR-QA Dataset | SROCC | 0.36204 | TOPIQ + Res50 (IAA) |
| Video Quality Assessment | MSU SR-QA Dataset | KLCC | 0.26774 | TOPIQ trained on FLIVE |
| Video Quality Assessment | MSU SR-QA Dataset | PLCC | 0.3394 | TOPIQ trained on FLIVE |
| Video Quality Assessment | MSU SR-QA Dataset | SROCC | 0.34092 | TOPIQ trained on FLIVE |
| Video | MSU SR-QA Dataset | KLCC | 0.5314 | TOPIQ trained on SPAQ (NR) |
| Video | MSU SR-QA Dataset | PLCC | 0.60905 | TOPIQ trained on SPAQ (NR) |
| Video | MSU SR-QA Dataset | SROCC | 0.64923 | TOPIQ trained on SPAQ (NR) |
| Video | MSU SR-QA Dataset | KLCC | 0.5067 | TOPIQ |
| Video | MSU SR-QA Dataset | PLCC | 0.57674 | TOPIQ |
| Video | MSU SR-QA Dataset | SROCC | 0.62715 | TOPIQ |
| Video | MSU SR-QA Dataset | KLCC | 0.48428 | TOPIQ FACE |
| Video | MSU SR-QA Dataset | PLCC | 0.58949 | TOPIQ FACE |
| Video | MSU SR-QA Dataset | SROCC | 0.59564 | TOPIQ FACE |
| Video | MSU SR-QA Dataset | KLCC | 0.46217 | TOPIQ |
| Video | MSU SR-QA Dataset | PLCC | 0.57955 | TOPIQ |
| Video | MSU SR-QA Dataset | SROCC | 0.57341 | TOPIQ |
| Video | MSU SR-QA Dataset | KLCC | 0.42811 | TOPIQ trained on PIPAL |
| Video | MSU SR-QA Dataset | PLCC | 0.57564 | TOPIQ trained on PIPAL |
| Video | MSU SR-QA Dataset | SROCC | 0.55568 | TOPIQ trained on PIPAL |
| Video | MSU SR-QA Dataset | KLCC | 0.40663 | TOPIQ (IAA) |
| Video | MSU SR-QA Dataset | PLCC | 0.51061 | TOPIQ (IAA) |
| Video | MSU SR-QA Dataset | SROCC | 0.51687 | TOPIQ (IAA) |
| Video | MSU SR-QA Dataset | KLCC | 0.28473 | TOPIQ + Res50 (IAA) |
| Video | MSU SR-QA Dataset | PLCC | 0.34 | TOPIQ + Res50 (IAA) |
| Video | MSU SR-QA Dataset | SROCC | 0.36204 | TOPIQ + Res50 (IAA) |
| Video | MSU SR-QA Dataset | KLCC | 0.26774 | TOPIQ trained on FLIVE |
| Video | MSU SR-QA Dataset | PLCC | 0.3394 | TOPIQ trained on FLIVE |
| Video | MSU SR-QA Dataset | SROCC | 0.34092 | TOPIQ trained on FLIVE |