TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Exploring Video Quality Assessment on User Generated Conte...

Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives

HaoNing Wu, Erli Zhang, Liang Liao, Chaofeng Chen, Jingwen Hou, Annan Wang, Wenxiu Sun, Qiong Yan, Weisi Lin

2022-11-09ICCV 2023 1DisentanglementVideo Quality AssessmentVisual Question Answering (VQA)Video Generation
PaperPDFCode(official)CodeCode(official)

Abstract

The rapid increase in user-generated-content (UGC) videos calls for the development of effective video quality assessment (VQA) algorithms. However, the objective of the UGC-VQA problem is still ambiguous and can be viewed from two perspectives: the technical perspective, measuring the perception of distortions; and the aesthetic perspective, which relates to preference and recommendation on contents. To understand how these two perspectives affect overall subjective opinions in UGC-VQA, we conduct a large-scale subjective study to collect human quality opinions on overall quality of videos as well as perceptions from aesthetic and technical perspectives. The collected Disentangled Video Quality Database (DIVIDE-3k) confirms that human quality opinions on UGC videos are universally and inevitably affected by both aesthetic and technical perspectives. In light of this, we propose the Disentangled Objective Video Quality Evaluator (DOVER) to learn the quality of UGC videos based on the two perspectives. The DOVER proves state-of-the-art performance in UGC-VQA under very high efficiency. With perspective opinions in DIVIDE-3k, we further propose DOVER++, the first approach to provide reliable clear-cut quality evaluations from a single aesthetic or technical perspective. Code at https://github.com/VQAssessment/DOVER.

Results

TaskDatasetMetricValueModel
Video UnderstandingMSU NR VQA DatabaseKLCC0.7216DOVER
Video UnderstandingMSU NR VQA DatabasePLCC0.9099DOVER
Video UnderstandingMSU NR VQA DatabaseSRCC0.8871DOVER
Video UnderstandingLIVE-VQCPLCC0.874DOVER (end-to-end)
Video UnderstandingLIVE-VQCPLCC0.863DOVER (head-only)
Video UnderstandingYouTube-UGCPLCC0.874DOVER (end-to-end)
Video UnderstandingYouTube-UGCPLCC0.862DOVER (head-only)
Video UnderstandingKoNViD-1kPLCC0.905DOVER (end-to-end)
Video UnderstandingKoNViD-1kPLCC0.894DOVER (head-only)
Video UnderstandingLIVE-FB LSVQPLCC0.889DOVER
Video Quality AssessmentMSU NR VQA DatabaseKLCC0.7216DOVER
Video Quality AssessmentMSU NR VQA DatabasePLCC0.9099DOVER
Video Quality AssessmentMSU NR VQA DatabaseSRCC0.8871DOVER
Video Quality AssessmentLIVE-VQCPLCC0.874DOVER (end-to-end)
Video Quality AssessmentLIVE-VQCPLCC0.863DOVER (head-only)
Video Quality AssessmentYouTube-UGCPLCC0.874DOVER (end-to-end)
Video Quality AssessmentYouTube-UGCPLCC0.862DOVER (head-only)
Video Quality AssessmentKoNViD-1kPLCC0.905DOVER (end-to-end)
Video Quality AssessmentKoNViD-1kPLCC0.894DOVER (head-only)
Video Quality AssessmentLIVE-FB LSVQPLCC0.889DOVER
VideoMSU NR VQA DatabaseKLCC0.7216DOVER
VideoMSU NR VQA DatabasePLCC0.9099DOVER
VideoMSU NR VQA DatabaseSRCC0.8871DOVER
VideoLIVE-VQCPLCC0.874DOVER (end-to-end)
VideoLIVE-VQCPLCC0.863DOVER (head-only)
VideoYouTube-UGCPLCC0.874DOVER (end-to-end)
VideoYouTube-UGCPLCC0.862DOVER (head-only)
VideoKoNViD-1kPLCC0.905DOVER (end-to-end)
VideoKoNViD-1kPLCC0.894DOVER (head-only)
VideoLIVE-FB LSVQPLCC0.889DOVER

Related Papers

CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models2025-07-18VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving2025-07-17Leveraging Pre-Trained Visual Models for AI-Generated Video Detection2025-07-17Taming Diffusion Transformer for Real-Time Mobile Video Generation2025-07-17LoViC: Efficient Long Video Generation with Context Compression2025-07-17MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM2025-07-16Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16