A Fast Algorithm for the Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit
Shintaro Nakamura, Masashi Sugiyama
2023-06-15Decision Making
Abstract
We study the real-valued combinatorial pure exploration problem in the stochastic multi-armed bandit (R-CPE-MAB). We study the case where the size of the action set is polynomial with respect to the number of arms. In such a case, the R-CPE-MAB can be seen as a special case of the so-called transductive linear bandits. We introduce an algorithm named the combinatorial gap-based exploration (CombGapE) algorithm, whose sample complexity upper bound matches the lower bound up to a problem-dependent constant factor. We numerically show that the CombGapE algorithm outperforms existing methods significantly in both synthetic and real-world datasets.
Related Papers
Graph-Structured Data Analysis of Component Failure in Autonomous Cargo Ships Based on Feature Fusion2025-07-18Higher-Order Pattern Unification Modulo Similarity Relations2025-07-17Exploiting Constraint Reasoning to Build Graphical Explanations for Mixed-Integer Linear Programming2025-07-17Acting and Planning with Hierarchical Operational Models on a Mobile Robot: A Study with RAE+UPOM2025-07-15CogDDN: A Cognitive Demand-Driven Navigation with Decision Optimization and Dual-Process Thinking2025-07-15Detección y Cuantificación de Erosión Fluvial con Visión Artificial2025-07-15Guiding LLM Decision-Making with Fairness Reward Models2025-07-15Turning Sand to Gold: Recycling Data to Bridge On-Policy and Off-Policy Learning via Causal Bound2025-07-15