Dingquan Li, Tingting Jiang, Ming Jiang
Quality assessment of in-the-wild videos is a challenging problem because of the absence of reference videos and shooting distortions. Knowledge of the human visual system can help establish methods for objective quality assessment of in-the-wild videos. In this work, we show two eminent effects of the human visual system, namely, content-dependency and temporal-memory effects, could be used for this purpose. We propose an objective no-reference video quality assessment method by integrating both effects into a deep neural network. For content-dependency, we extract features from a pre-trained image classification neural network for its inherent content-aware property. For temporal-memory effects, long-term dependencies, especially the temporal hysteresis, are integrated into the network with a gated recurrent unit and a subjectively-inspired temporal pooling layer. To validate the performance of our method, experiments are conducted on three publicly available in-the-wild video quality assessment databases: KoNViD-1k, CVD2014, and LIVE-Qualcomm, respectively. Experimental results demonstrate that our proposed method outperforms five state-of-the-art methods by a large margin, specifically, 12.39%, 15.71%, 15.45%, and 18.09% overall performance improvements over the second-best method VBLIINDS, in terms of SROCC, KROCC, PLCC and RMSE, respectively. Moreover, the ablation study verifies the crucial role of both the content-aware features and the modeling of temporal-memory effects. The PyTorch implementation of our method is released at https://github.com/lidq92/VSFA.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Video Understanding | MSU NR VQA Database | KLCC | 0.7483 | VSFA |
| Video Understanding | MSU NR VQA Database | PLCC | 0.918 | VSFA |
| Video Understanding | MSU NR VQA Database | SRCC | 0.9049 | VSFA |
| Video Understanding | LIVE-VQC | PLCC | 0.7426 | VSFA |
| Video Understanding | MSU SR-QA Dataset | KLCC | 0.43634 | VSFA |
| Video Understanding | MSU SR-QA Dataset | PLCC | 0.54407 | VSFA |
| Video Understanding | MSU SR-QA Dataset | SROCC | 0.53652 | VSFA |
| Video Understanding | KoNViD-1k | PLCC | 0.7754 | VSFA |
| Video Quality Assessment | MSU NR VQA Database | KLCC | 0.7483 | VSFA |
| Video Quality Assessment | MSU NR VQA Database | PLCC | 0.918 | VSFA |
| Video Quality Assessment | MSU NR VQA Database | SRCC | 0.9049 | VSFA |
| Video Quality Assessment | LIVE-VQC | PLCC | 0.7426 | VSFA |
| Video Quality Assessment | MSU SR-QA Dataset | KLCC | 0.43634 | VSFA |
| Video Quality Assessment | MSU SR-QA Dataset | PLCC | 0.54407 | VSFA |
| Video Quality Assessment | MSU SR-QA Dataset | SROCC | 0.53652 | VSFA |
| Video Quality Assessment | KoNViD-1k | PLCC | 0.7754 | VSFA |
| Video | MSU NR VQA Database | KLCC | 0.7483 | VSFA |
| Video | MSU NR VQA Database | PLCC | 0.918 | VSFA |
| Video | MSU NR VQA Database | SRCC | 0.9049 | VSFA |
| Video | LIVE-VQC | PLCC | 0.7426 | VSFA |
| Video | MSU SR-QA Dataset | KLCC | 0.43634 | VSFA |
| Video | MSU SR-QA Dataset | PLCC | 0.54407 | VSFA |
| Video | MSU SR-QA Dataset | SROCC | 0.53652 | VSFA |
| Video | KoNViD-1k | PLCC | 0.7754 | VSFA |