TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/REINFORCE

REINFORCE

Reinforcement LearningIntroduced 1999185 papers

Description

REINFORCE is a Monte Carlo variant of a policy gradient algorithm in reinforcement learning. The agent collects samples of an episode using its current policy, and uses it to update the policy parameter θ\thetaθ. Since one full trajectory must be completed to construct a sample space, it is updated as an off-policy algorithm.

∇_θJ(θ)=E_π[G_t∇_θln⁡π_θ(A_t∣S_t)] \nabla\_{\theta}J\left(\theta\right) = \mathbb{E}\_{\pi}\left[G\_{t}\nabla\_{\theta}\ln\pi\_{\theta}\left(A\_{t}\mid{S\_{t}}\right)\right]∇_θJ(θ)=E_π[G_t∇_θlnπ_θ(A_t∣S_t)]

Image Credit: Tingwu Wang

Papers Using This Method

Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards2025-06-25Quantum Fisher-Preconditioned Reinforcement Learning: From Single-Qubit Control to Rayleigh-Fading Link Adaptation2025-06-18Zeroth-Order Optimization is Secretly Single-Step Policy Optimization2025-06-17Response-Level Rewards Are All You Need for Online Reinforcement Learning in LLMs: A Mathematical Perspective2025-06-03REOrdering Patches Improves Vision Models2025-05-29Policy Gradient with Second Order Momentum2025-05-16Measures of Variability for Risk-averse Policy Gradient2025-04-15Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs2025-03-18Multi-Fidelity Policy Gradient Algorithms2025-03-07VQEL: Enabling Self-Developed Symbolic Language in Agents through Vector Quantization in Emergent Language Games2025-03-06REINFORCE Adversarial Attacks on Large Language Models: An Adaptive, Distributional, and Semantic Objective2025-02-24Sample Complexity of Linear Quadratic Regulator Without Initial Stability2025-02-20REINFORCE-ING Chemical Language Models in Drug Design2025-01-27REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models2025-01-04Graph-attention-based Casual Discovery with Trust Region-navigated Clipping Policy Optimization2024-12-27QE-EBM: Using Quality Estimators as Energy Loss for Machine Translation2024-10-14On Divergence Measures for Training GFlowNets2024-10-12TACO-RL: Task Aware Prompt Compression Optimization with Reinforcement Learning2024-09-19Quantum-inspired Reinforcement Learning for Synthesizable Drug Design2024-09-13Automated Data Augmentation for Few-Shot Time Series Forecasting: A Reinforcement Learning Approach Guided by a Model Zoo2024-09-10