DPO

Direct Preference Optimization

Reinforcement LearningIntroduced 2000409 papers

Papers Using This Method

Multi-Preference Lambda-weighted Listwise DPO for Dynamic Preference Alignment2025-06-24Smart-LLaMA-DPO: Reinforced Large Language Model for Explainable Smart Contract Vulnerability Detection2025-06-23video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models2025-06-18TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization2025-06-17Alignment Quality Index (AQI) : Beyond Refusals: AQI as an Intrinsic Alignment Diagnostic via Latent Geometry, Cluster Divergence, and Layer wise Pooled Representations2025-06-16CAMS: A CityGPT-Powered Agentic Framework for Urban Human Mobility Simulation2025-06-16From Judgment to Interference: Early Stopping LLM Harmful Outputs via Streaming Content Monitoring2025-06-11Towards Bridging the Reward-Generation Gap in Direct Alignment Algorithms2025-06-11Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning2025-06-11QA-LIGN: Aligning LLMs through Constitutionally Decomposed QA2025-06-09Explicit Preference Optimization: No Need for an Implicit Reward Model2025-06-09LeVo: High-Quality Song Generation with Multi-Preference Alignment2025-06-09LeanPO: Lean Preference Optimization for Likelihood Alignment in Video-LLMs2025-06-05Aligning Large Language Models with Implicit Preferences from User-Generated Content2025-06-04SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models2025-06-04Understanding the Impact of Sampling Quality in Direct Preference Optimization2025-06-03Protein Inverse Folding From Structure Feedback2025-06-03Smoothed Preference Optimization via ReNoise Inversion for Aligning Diffusion Models with Varied Human Preferences2025-06-03IF-GUIDE: Influence Function-Guided Detoxification of LLMs2025-06-02Generalizable LLM Learning of Graph Synthetic Data with Reinforcement Learning2025-06-01