TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/ReLaX-VQA: Residual Fragment and Layer Stack Extraction fo...

ReLaX-VQA: Residual Fragment and Layer Stack Extraction for Enhancing Video Quality Assessment

Xinyi Wang, Angeliki Katsenou, David Bull

2024-07-16Video CompressionOptical Flow EstimationVideo Quality AssessmentVisual Question Answering (VQA)
PaperPDFCode(official)

Abstract

With the rapid growth of User-Generated Content (UGC) exchanged between users and sharing platforms, the need for video quality assessment in the wild is increasingly evident. UGC is typically acquired using consumer devices and undergoes multiple rounds of compression (transcoding) before reaching the end user. Therefore, traditional quality metrics that employ the original content as a reference are not suitable. In this paper, we propose ReLaX-VQA, a novel No-Reference Video Quality Assessment (NR-VQA) model that aims to address the challenges of evaluating the quality of diverse video content without reference to the original uncompressed videos. ReLaX-VQA uses frame differences to select spatio-temporal fragments intelligently together with different expressions of spatial features associated with the sampled frames. These are then used to better capture spatial and temporal variabilities in the quality of neighbouring frames. Furthermore, the model enhances abstraction by employing layer-stacking techniques in deep neural network features from Residual Networks and Vision Transformers. Extensive testing across four UGC datasets demonstrates that ReLaX-VQA consistently outperforms existing NR-VQA methods, achieving an average SRCC of 0.8658 and PLCC of 0.8873. Open-source code and trained models that will facilitate further research and applications of NR-VQA can be found at https://github.com/xinyiW915/ReLaX-VQA.

Results

TaskDatasetMetricValueModel
Video UnderstandingLIVE-VQCPLCC0.8876ReLaX-VQA (finetuned on LIVE-VQC)
Video UnderstandingLIVE-VQCPLCC0.8242ReLaX-VQA (trained on LSVQ only)
Video UnderstandingLIVE-VQCPLCC0.8079ReLaX-VQA
Video UnderstandingYouTube-UGCPLCC0.8652ReLaX-VQA (finetuned on YouTube-UGC)
Video UnderstandingYouTube-UGCPLCC0.8354ReLaX-VQA (trained on LSVQ only)
Video UnderstandingYouTube-UGCPLCC0.8204ReLaX-VQA
Video UnderstandingKoNViD-1kPLCC0.8668ReLaX-VQA (finetuned on KoNViD-1k)
Video UnderstandingKoNViD-1kPLCC0.8473ReLaX-VQA
Video UnderstandingKoNViD-1kPLCC0.8427ReLaX-VQA (trained on LSVQ only)
Video Quality AssessmentLIVE-VQCPLCC0.8876ReLaX-VQA (finetuned on LIVE-VQC)
Video Quality AssessmentLIVE-VQCPLCC0.8242ReLaX-VQA (trained on LSVQ only)
Video Quality AssessmentLIVE-VQCPLCC0.8079ReLaX-VQA
Video Quality AssessmentYouTube-UGCPLCC0.8652ReLaX-VQA (finetuned on YouTube-UGC)
Video Quality AssessmentYouTube-UGCPLCC0.8354ReLaX-VQA (trained on LSVQ only)
Video Quality AssessmentYouTube-UGCPLCC0.8204ReLaX-VQA
Video Quality AssessmentKoNViD-1kPLCC0.8668ReLaX-VQA (finetuned on KoNViD-1k)
Video Quality AssessmentKoNViD-1kPLCC0.8473ReLaX-VQA
Video Quality AssessmentKoNViD-1kPLCC0.8427ReLaX-VQA (trained on LSVQ only)
VideoLIVE-VQCPLCC0.8876ReLaX-VQA (finetuned on LIVE-VQC)
VideoLIVE-VQCPLCC0.8242ReLaX-VQA (trained on LSVQ only)
VideoLIVE-VQCPLCC0.8079ReLaX-VQA
VideoYouTube-UGCPLCC0.8652ReLaX-VQA (finetuned on YouTube-UGC)
VideoYouTube-UGCPLCC0.8354ReLaX-VQA (trained on LSVQ only)
VideoYouTube-UGCPLCC0.8204ReLaX-VQA
VideoKoNViD-1kPLCC0.8668ReLaX-VQA (finetuned on KoNViD-1k)
VideoKoNViD-1kPLCC0.8473ReLaX-VQA
VideoKoNViD-1kPLCC0.8427ReLaX-VQA (trained on LSVQ only)

Related Papers

Channel-wise Motion Features for Efficient Motion Segmentation2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM2025-07-16Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16An Efficient Approach for Muscle Segmentation and 3D Reconstruction Using Keypoint Tracking in MRI Scan2025-07-11Evaluating Attribute Confusion in Fashion Text-to-Image Generation2025-07-09LinguaMark: Do Multimodal Models Speak Fairly? A Benchmark-Based Evaluation2025-07-09GSVR: 2D Gaussian-based Video Representation for 800+ FPS with Hybrid Deformation Field2025-07-08