On Global Convergence Rates for Federated Policy Gradient under Heterogeneous Environment

Safwan Labbi, Paul Mangold, Daniil Tiapkin, Eric Moulines

2025-05-29Policy Gradient Methods Federated Learning Q-Learning

Abstract

Ensuring convergence of policy gradient methods in federated reinforcement learning (FRL) under environment heterogeneity remains a major challenge. In this work, we first establish that heterogeneity, perhaps counter-intuitively, can necessitate optimal policies to be non-deterministic or even time-varying, even in tabular environments. Subsequently, we prove global convergence results for federated policy gradient (FedPG) algorithms employing local updates, under a {\L}ojasiewicz condition that holds only for each individual agent, in both entropy-regularized and non-regularized scenarios. Crucially, our theoretical analysis shows that FedPG attains linear speed-up with respect to the number of agents, a property central to efficient federated learning. Leveraging insights from our theoretical findings, we introduce b-RS-FedPG, a novel policy gradient method that employs a carefully constructed softmax-inspired parameterization coupled with an appropriate regularization scheme. We further demonstrate explicit convergence rates for b-RS-FedPG toward near-optimal stationary policies. Finally, we demonstrate that empirically both FedPG and b-RS-FedPG consistently outperform federated Q-learning on heterogeneous settings.

Related Papers

Improving DAPO from a Mixed-Policy Perspective2025-07-17 FedGA: A Fair Federated Learning Framework Based on the Gini Coefficient2025-07-17 A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints2025-07-17 Federated Learning for Commercial Image Sources2025-07-17 Evaluating Reinforcement Learning Algorithms for Navigation in Simulated Robotic Quadrupeds: A Comparative Study Inspired by Guide Dog Behaviour2025-07-17 A Bayesian Incentive Mechanism for Poison-Resilient Federated Learning2025-07-16 Federated Learning in Open- and Closed-Loop EMG Decoding: A Privacy and Performance Perspective2025-07-16 Safeguarding Federated Learning-based Road Condition Classification2025-07-16