Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/REINFORCE

REINFORCE

Reinforcement LearningIntroduced 1999185 papers

Description

REINFORCE is a Monte Carlo variant of a policy gradient algorithm in reinforcement learning. The agent collects samples of an episode using its current policy, and uses it to update the policy parameter $\theta$ . Since one full trajectory must be completed to construct a sample space, it is updated as an off-policy algorithm.

$\nabla\_{\theta}J\left(\theta\right) = \mathbb{E}\_{\pi}\left[G\_{t}\nabla\_{\theta}\ln\pi\_{\theta}\left(A\_{t}\mid{S\_{t}}\right)\right]$

Image Credit: Tingwu Wang

Papers Using This Method

Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards2025-06-25 Quantum Fisher-Preconditioned Reinforcement Learning: From Single-Qubit Control to Rayleigh-Fading Link Adaptation2025-06-18 Zeroth-Order Optimization is Secretly Single-Step Policy Optimization2025-06-17 Response-Level Rewards Are All You Need for Online Reinforcement Learning in LLMs: A Mathematical Perspective2025-06-03 REOrdering Patches Improves Vision Models2025-05-29 Policy Gradient with Second Order Momentum2025-05-16 Measures of Variability for Risk-averse Policy Gradient2025-04-15 Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs2025-03-18 Multi-Fidelity Policy Gradient Algorithms2025-03-07 VQEL: Enabling Self-Developed Symbolic Language in Agents through Vector Quantization in Emergent Language Games2025-03-06 REINFORCE Adversarial Attacks on Large Language Models: An Adaptive, Distributional, and Semantic Objective2025-02-24 Sample Complexity of Linear Quadratic Regulator Without Initial Stability2025-02-20 REINFORCE-ING Chemical Language Models in Drug Design2025-01-27 REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models2025-01-04 Graph-attention-based Casual Discovery with Trust Region-navigated Clipping Policy Optimization2024-12-27 QE-EBM: Using Quality Estimators as Energy Loss for Machine Translation2024-10-14 On Divergence Measures for Training GFlowNets2024-10-12 TACO-RL: Task Aware Prompt Compression Optimization with Reinforcement Learning2024-09-19 Quantum-inspired Reinforcement Learning for Synthesizable Drug Design2024-09-13 Automated Data Augmentation for Few-Shot Time Series Forecasting: A Reinforcement Learning Approach Guided by a Model Zoo2024-09-10