Tasks
SotA
Datasets
Papers
Methods
Submit
About
Methods
/
Step-DPO
Step-DPO
Step-wise Direct Preference Optimization
Natural Language Processing
Introduced 2000
2 papers
Description
Please enter a description about the method here
Papers Using This Method
Full-Step-DPO: Self-Supervised Preference Optimization with Step-wise Rewards for Mathematical Reasoning
2025-02-20
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs
2024-06-26