TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/2BiVQA: Double Bi-LSTM based Video Quality Assessment of U...

2BiVQA: Double Bi-LSTM based Video Quality Assessment of UGC Videos

Ahmed Telili, Sid Ahmed Fezza, Wassim Hamidouche, Hanene F. Z. Brachemi Meftah

2022-08-31Video Quality AssessmentVisual Question Answering (VQA)
PaperPDFCode(official)

Abstract

Recently, with the growing popularity of mobile devices as well as video sharing platforms (e.g., YouTube, Facebook, TikTok, and Twitch), User-Generated Content (UGC) videos have become increasingly common and now account for a large portion of multimedia traffic on the internet. Unlike professionally generated videos produced by filmmakers and videographers, typically, UGC videos contain multiple authentic distortions, generally introduced during capture and processing by naive users. Quality prediction of UGC videos is of paramount importance to optimize and monitor their processing in hosting platforms, such as their coding, transcoding, and streaming. However, blind quality prediction of UGC is quite challenging because the degradations of UGC videos are unknown and very diverse, in addition to the unavailability of pristine reference. Therefore, in this paper, we propose an accurate and efficient Blind Video Quality Assessment (BVQA) model for UGC videos, which we name 2BiVQA for double Bi-LSTM Video Quality Assessment. 2BiVQA metric consists of three main blocks, including a pre-trained Convolutional Neural Network (CNN) to extract discriminative features from image patches, which are then fed into two Recurrent Neural Networks (RNNs) for spatial and temporal pooling. Specifically, we use two Bi-directional Long Short Term Memory (Bi-LSTM) networks, the first is used to capture short-range dependencies between image patches, while the second allows capturing longrange dependencies between frames to account for the temporal memory effect. Experimental results on recent large-scale UGC VQA datasets show that 2BiVQA achieves high performance at lower computational cost than most state-of-the-art VQA models. The source code of our 2BiVQA metric is made publicly available at: https://github.com/atelili/2BiVQA

Results

TaskDatasetMetricValueModel
Video UnderstandingLIVE-VQCPLCC0.8322BiVQA
Video UnderstandingYouTube-UGCPLCC0.7942BiVQA
Video UnderstandingKoNViD-1kPLCC0.8352BiVQA
Video Quality AssessmentLIVE-VQCPLCC0.8322BiVQA
Video Quality AssessmentYouTube-UGCPLCC0.7942BiVQA
Video Quality AssessmentKoNViD-1kPLCC0.8352BiVQA
VideoLIVE-VQCPLCC0.8322BiVQA
VideoYouTube-UGCPLCC0.7942BiVQA
VideoKoNViD-1kPLCC0.8352BiVQA

Related Papers

VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM2025-07-16Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16Evaluating Attribute Confusion in Fashion Text-to-Image Generation2025-07-09LinguaMark: Do Multimodal Models Speak Fairly? A Benchmark-Based Evaluation2025-07-09Decoupled Seg Tokens Make Stronger Reasoning Video Segmenter and Grounder2025-06-28Bridging Video Quality Scoring and Justification via Large Multimodal Models2025-06-26DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document Images2025-06-26