No-Reference Video Quality Assessment Using Space-Time Chips

2020-08-23Video Quality Assessment Visual Question Answering (VQA)

Abstract

We propose a new prototype model for no-reference video quality assessment (VQA) based on the natural statistics of space-time chips of videos. Space-time chips (ST-chips) are a new, quality-aware feature space which we define as space-time localized cuts of video data in directions that are determined by the local motion flow. We use parametrized distribution fits to the bandpass histograms of space-time chips to characterize quality, and show that the parameters from these models are affected by distortion and can hence be used to objectively predict the quality of videos. Our prototype method, which we call ChipQA-0, is agnostic to the types of distortion affecting the video, and is based on identifying and quantifying deviations from the expected statistics of natural, undistorted ST-chips in order to predict video quality. We train and test our resulting model on several large VQA databases and show that our model achieves high correlation against human judgments of video quality and is competitive with state-of-the-art models.

Results

Task	Dataset	Metric	Value	Model
Video Understanding	LIVE-ETRI	SRCC	0.4028	ChipQA-0
Video Understanding	LIVE Livestream	SRCC	0.7513	ChipQA-0
Video Quality Assessment	LIVE-ETRI	SRCC	0.4028	ChipQA-0
Video Quality Assessment	LIVE Livestream	SRCC	0.7513	ChipQA-0
Video	LIVE-ETRI	SRCC	0.4028	ChipQA-0
Video	LIVE Livestream	SRCC	0.7513	ChipQA-0

Related Papers

VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17 MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM2025-07-16 Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16 Evaluating Attribute Confusion in Fashion Text-to-Image Generation2025-07-09 LinguaMark: Do Multimodal Models Speak Fairly? A Benchmark-Based Evaluation2025-07-09 Decoupled Seg Tokens Make Stronger Reasoning Video Segmenter and Grounder2025-06-28 Bridging Video Quality Scoring and Justification via Large Multimodal Models2025-06-26 DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document Images2025-06-26