TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/CONVIQT: Contrastive Video Quality Estimator

CONVIQT: Contrastive Video Quality Estimator

Pavan C. Madhusudana, Neil Birkbeck, Yilin Wang, Balu Adsumilli, Alan C. Bovik

2022-06-29Self-Supervised LearningVideo Quality Assessment
PaperPDFCode(official)

Abstract

Perceptual video quality assessment (VQA) is an integral component of many streaming and video sharing platforms. Here we consider the problem of learning perceptually relevant video quality representations in a self-supervised manner. Distortion type identification and degradation level determination is employed as an auxiliary task to train a deep learning model containing a deep Convolutional Neural Network (CNN) that extracts spatial features, as well as a recurrent unit that captures temporal information. The model is trained using a contrastive loss and we therefore refer to this training framework and resulting model as CONtrastive VIdeo Quality EstimaTor (CONVIQT). During testing, the weights of the trained model are frozen, and a linear regressor maps the learned features to quality scores in a no-reference (NR) setting. We conduct comprehensive evaluations of the proposed model on multiple VQA databases by analyzing the correlations between model predictions and ground-truth quality ratings, and achieve competitive performance when compared to state-of-the-art NR-VQA models, even though it is not trained on those databases. Our ablation experiments demonstrate that the learned representations are highly robust and generalize well across synthetic and realistic distortions. Our results indicate that compelling representations with perceptual bearing can be obtained using self-supervised learning. The implementations used in this work have been made available at https://github.com/pavancm/CONVIQT.

Results

TaskDatasetMetricValueModel
Video UnderstandingLIVE-VQCPLCC0.817CONVIQT
Video UnderstandingYouTube-UGCPLCC0.822CONVIQT
Video UnderstandingLIVE-ETRISRCC0.939CONVIQT
Video UnderstandingKoNViD-1kPLCC0.849CONVIQT
Video UnderstandingLIVE-FB LSVQPLCC0.82CONVIQT
Video Quality AssessmentLIVE-VQCPLCC0.817CONVIQT
Video Quality AssessmentYouTube-UGCPLCC0.822CONVIQT
Video Quality AssessmentLIVE-ETRISRCC0.939CONVIQT
Video Quality AssessmentKoNViD-1kPLCC0.849CONVIQT
Video Quality AssessmentLIVE-FB LSVQPLCC0.82CONVIQT
VideoLIVE-VQCPLCC0.817CONVIQT
VideoYouTube-UGCPLCC0.822CONVIQT
VideoLIVE-ETRISRCC0.939CONVIQT
VideoKoNViD-1kPLCC0.849CONVIQT
VideoLIVE-FB LSVQPLCC0.82CONVIQT

Related Papers

A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17Self-supervised Learning on Camera Trap Footage Yields a Strong Universal Face Embedder2025-07-14Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis2025-07-08World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model2025-07-01ShapeEmbed: a self-supervised learning framework for 2D contour quantification2025-07-01RetFiner: A Vision-Language Refinement Scheme for Retinal Foundation Models2025-06-27Boosting Generative Adversarial Transferability with Self-supervised Vision Transformer Features2025-06-26Hybrid Deep Learning and Signal Processing for Arabic Dialect Recognition in Low-Resource Settings2025-06-26