Efficient Counterfactual Learning from Bandit Feedback

Yusuke Narita, Shota Yasui, Kohei Yata

2018-09-10Visual Object Tracking Causal Inference

Abstract

What is the most statistically efficient way to do off-policy evaluation and optimization with batch data from bandit feedback? For log data generated by contextual bandit algorithms, we consider offline estimators for the expected reward from a counterfactual policy. Our estimators are shown to have lowest variance in a wide class of estimators, achieving variance reduction relative to standard estimators. We then apply our estimators to improve advertisement design by a major advertisement company. Consistent with the theoretical result, our estimators allow us to improve on the existing bandit algorithm with more statistical confidence compared to a state-of-the-art benchmark.

Results

Task	Dataset	Metric	Value
Object Tracking	VOT2014	Expected Average Overlap (EAO)	1.047
Causal Inference	IDHP	Average Treatment Effect Error	-0.225
Visual Object Tracking	VOT2014	Expected Average Overlap (EAO)	1.047

Related Papers

Estimating Interventional Distributions with Uncertain Causal Graphs through Meta-Learning2025-07-07 UMDATrack: Unified Multi-Domain Adaptive Tracking Under Adverse Weather Conditions2025-07-01 Mamba-FETrack V2: Revisiting State Space Model for Frame-Event based Visual Object Tracking2025-06-30 R1-Track: Direct Application of MLLMs to Visual Object Tracking via Reinforcement Learning2025-06-27 Causal-Aware Intelligent QoE Optimization for VR Interaction with Adaptive Keyframe Extraction2025-06-24 Quantum Neural Networks for Propensity Score Estimation and Survival Analysis in Observational Biomedical Studies2025-06-24 Bayesian Evolutionary Swarm Architecture: A Formal Epistemic System Grounded in Truth-Based Competition2025-06-23 T-CPDL: A Temporal Causal Probabilistic Description Logic for Developing Logic-RAG Agent2025-06-23