Step-DPO

Step-wise Direct Preference Optimization

Natural Language ProcessingIntroduced 20002 papers

Description

Please enter a description about the method here

Papers Using This Method

Full-Step-DPO: Self-Supervised Preference Optimization with Step-wise Rewards for Mathematical Reasoning2025-02-20 Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs2024-06-26