Junjie Ke, Qifei Wang, Yilin Wang, Peyman Milanfar, Feng Yang
Image quality assessment (IQA) is an important research topic for understanding and improving visual experience. The current state-of-the-art IQA methods are based on convolutional neural networks (CNNs). The performance of CNN-based models is often compromised by the fixed shape constraint in batch training. To accommodate this, the input images are usually resized and cropped to a fixed shape, causing image quality degradation. To address this, we design a multi-scale image quality Transformer (MUSIQ) to process native resolution images with varying sizes and aspect ratios. With a multi-scale image representation, our proposed method can capture image quality at different granularities. Furthermore, a novel hash-based 2D spatial embedding and a scale embedding is proposed to support the positional embedding in the multi-scale representation. Experimental results verify that our method can achieve state-of-the-art performance on multiple large scale IQA datasets such as PaQ-2-PiQ, SPAQ and KonIQ-10k.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Video Understanding | MSU NR VQA Database | KLCC | 0.7433 | MUSIQ |
| Video Understanding | MSU NR VQA Database | PLCC | 0.9068 | MUSIQ |
| Video Understanding | MSU NR VQA Database | SRCC | 0.9004 | MUSIQ |
| Video Understanding | MSU SR-QA Dataset | KLCC | 0.55312 | MUSIQ trained on PaQ-2-PiQ |
| Video Understanding | MSU SR-QA Dataset | PLCC | 0.66531 | MUSIQ trained on PaQ-2-PiQ |
| Video Understanding | MSU SR-QA Dataset | SROCC | 0.67746 | MUSIQ trained on PaQ-2-PiQ |
| Video Understanding | MSU SR-QA Dataset | KLCC | 0.52673 | MUSIQ trained on SPAQ |
| Video Understanding | MSU SR-QA Dataset | PLCC | 0.60216 | MUSIQ trained on SPAQ |
| Video Understanding | MSU SR-QA Dataset | SROCC | 0.64927 | MUSIQ trained on SPAQ |
| Video Understanding | MSU SR-QA Dataset | KLCC | 0.51897 | MUSIQ trained on KONIQ |
| Video Understanding | MSU SR-QA Dataset | PLCC | 0.59151 | MUSIQ trained on KONIQ |
| Video Understanding | MSU SR-QA Dataset | SROCC | 0.64589 | MUSIQ trained on KONIQ |
| Video Understanding | MSU SR-QA Dataset | KLCC | 0.44669 | MUSIQ trained on AVA |
| Video Understanding | MSU SR-QA Dataset | PLCC | 0.52404 | MUSIQ trained on AVA |
| Video Understanding | MSU SR-QA Dataset | SROCC | 0.56152 | MUSIQ trained on AVA |
| Video Quality Assessment | MSU NR VQA Database | KLCC | 0.7433 | MUSIQ |
| Video Quality Assessment | MSU NR VQA Database | PLCC | 0.9068 | MUSIQ |
| Video Quality Assessment | MSU NR VQA Database | SRCC | 0.9004 | MUSIQ |
| Video Quality Assessment | MSU SR-QA Dataset | KLCC | 0.55312 | MUSIQ trained on PaQ-2-PiQ |
| Video Quality Assessment | MSU SR-QA Dataset | PLCC | 0.66531 | MUSIQ trained on PaQ-2-PiQ |
| Video Quality Assessment | MSU SR-QA Dataset | SROCC | 0.67746 | MUSIQ trained on PaQ-2-PiQ |
| Video Quality Assessment | MSU SR-QA Dataset | KLCC | 0.52673 | MUSIQ trained on SPAQ |
| Video Quality Assessment | MSU SR-QA Dataset | PLCC | 0.60216 | MUSIQ trained on SPAQ |
| Video Quality Assessment | MSU SR-QA Dataset | SROCC | 0.64927 | MUSIQ trained on SPAQ |
| Video Quality Assessment | MSU SR-QA Dataset | KLCC | 0.51897 | MUSIQ trained on KONIQ |
| Video Quality Assessment | MSU SR-QA Dataset | PLCC | 0.59151 | MUSIQ trained on KONIQ |
| Video Quality Assessment | MSU SR-QA Dataset | SROCC | 0.64589 | MUSIQ trained on KONIQ |
| Video Quality Assessment | MSU SR-QA Dataset | KLCC | 0.44669 | MUSIQ trained on AVA |
| Video Quality Assessment | MSU SR-QA Dataset | PLCC | 0.52404 | MUSIQ trained on AVA |
| Video Quality Assessment | MSU SR-QA Dataset | SROCC | 0.56152 | MUSIQ trained on AVA |
| Image Quality Assessment | MSU NR VQA Database | KLCC | 0.7433 | MUSIQ |
| Image Quality Assessment | MSU NR VQA Database | PLCC | 0.9068 | MUSIQ |
| Image Quality Assessment | MSU NR VQA Database | SRCC | 0.9004 | MUSIQ |
| Video | MSU NR VQA Database | KLCC | 0.7433 | MUSIQ |
| Video | MSU NR VQA Database | PLCC | 0.9068 | MUSIQ |
| Video | MSU NR VQA Database | SRCC | 0.9004 | MUSIQ |
| Video | MSU SR-QA Dataset | KLCC | 0.55312 | MUSIQ trained on PaQ-2-PiQ |
| Video | MSU SR-QA Dataset | PLCC | 0.66531 | MUSIQ trained on PaQ-2-PiQ |
| Video | MSU SR-QA Dataset | SROCC | 0.67746 | MUSIQ trained on PaQ-2-PiQ |
| Video | MSU SR-QA Dataset | KLCC | 0.52673 | MUSIQ trained on SPAQ |
| Video | MSU SR-QA Dataset | PLCC | 0.60216 | MUSIQ trained on SPAQ |
| Video | MSU SR-QA Dataset | SROCC | 0.64927 | MUSIQ trained on SPAQ |
| Video | MSU SR-QA Dataset | KLCC | 0.51897 | MUSIQ trained on KONIQ |
| Video | MSU SR-QA Dataset | PLCC | 0.59151 | MUSIQ trained on KONIQ |
| Video | MSU SR-QA Dataset | SROCC | 0.64589 | MUSIQ trained on KONIQ |
| Video | MSU SR-QA Dataset | KLCC | 0.44669 | MUSIQ trained on AVA |
| Video | MSU SR-QA Dataset | PLCC | 0.52404 | MUSIQ trained on AVA |
| Video | MSU SR-QA Dataset | SROCC | 0.56152 | MUSIQ trained on AVA |