TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Neighbourhood Representative Sampling for Efficient End-to...

Neighbourhood Representative Sampling for Efficient End-to-end Video Quality Assessment

HaoNing Wu, Chaofeng Chen, Liang Liao, Jingwen Hou, Wenxiu Sun, Qiong Yan, Jinwei Gu, Weisi Lin

2022-10-11Video Quality AssessmentVisual Question Answering (VQA)
PaperPDFCodeCodeCodeCode(official)

Abstract

The increased resolution of real-world videos presents a dilemma between efficiency and accuracy for deep Video Quality Assessment (VQA). On the one hand, keeping the original resolution will lead to unacceptable computational costs. On the other hand, existing practices, such as resizing and cropping, will change the quality of original videos due to the loss of details and contents, and are therefore harmful to quality assessment. With the obtained insight from the study of spatial-temporal redundancy in the human visual system and visual coding theory, we observe that quality information around a neighbourhood is typically similar, motivating us to investigate an effective quality-sensitive neighbourhood representatives scheme for VQA. In this work, we propose a unified scheme, spatial-temporal grid mini-cube sampling (St-GMS) to get a novel type of sample, named fragments. Full-resolution videos are first divided into mini-cubes with preset spatial-temporal grids, then the temporal-aligned quality representatives are sampled to compose the fragments that serve as inputs for VQA. In addition, we design the Fragment Attention Network (FANet), a network architecture tailored specifically for fragments. With fragments and FANet, the proposed efficient end-to-end FAST-VQA and FasterVQA achieve significantly better performance than existing approaches on all VQA benchmarks while requiring only 1/1612 FLOPs compared to the current state-of-the-art. Codes, models and demos are available at https://github.com/timothyhtimothy/FAST-VQA-and-FasterVQA.

Results

TaskDatasetMetricValueModel
Video UnderstandingLIVE-VQCPLCC0.858FasterVQA (fine-tuned)
Video UnderstandingYouTube-UGCPLCC0.859FasterVQA (fine-tuned)
Video UnderstandingKoNViD-1kPLCC0.898FasterVQA (fine-tuned)
Video UnderstandingLIVE-FB LSVQPLCC0.874FasterVQA
Video Quality AssessmentLIVE-VQCPLCC0.858FasterVQA (fine-tuned)
Video Quality AssessmentYouTube-UGCPLCC0.859FasterVQA (fine-tuned)
Video Quality AssessmentKoNViD-1kPLCC0.898FasterVQA (fine-tuned)
Video Quality AssessmentLIVE-FB LSVQPLCC0.874FasterVQA
VideoLIVE-VQCPLCC0.858FasterVQA (fine-tuned)
VideoYouTube-UGCPLCC0.859FasterVQA (fine-tuned)
VideoKoNViD-1kPLCC0.898FasterVQA (fine-tuned)
VideoLIVE-FB LSVQPLCC0.874FasterVQA

Related Papers

VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM2025-07-16Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16Evaluating Attribute Confusion in Fashion Text-to-Image Generation2025-07-09LinguaMark: Do Multimodal Models Speak Fairly? A Benchmark-Based Evaluation2025-07-09Decoupled Seg Tokens Make Stronger Reasoning Video Segmenter and Grounder2025-06-28Bridging Video Quality Scoring and Justification via Large Multimodal Models2025-06-26DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document Images2025-06-26