DPO

Direct Preference Optimization

Reinforcement LearningIntroduced 2000409 papers

Papers Using This Method

Multi-Preference Lambda-weighted Listwise DPO for Dynamic Preference Alignment2025-06-24 Smart-LLaMA-DPO: Reinforced Large Language Model for Explainable Smart Contract Vulnerability Detection2025-06-23 video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models2025-06-18 TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization2025-06-17 Alignment Quality Index (AQI) : Beyond Refusals: AQI as an Intrinsic Alignment Diagnostic via Latent Geometry, Cluster Divergence, and Layer wise Pooled Representations2025-06-16 CAMS: A CityGPT-Powered Agentic Framework for Urban Human Mobility Simulation2025-06-16 From Judgment to Interference: Early Stopping LLM Harmful Outputs via Streaming Content Monitoring2025-06-11 Towards Bridging the Reward-Generation Gap in Direct Alignment Algorithms2025-06-11 Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning2025-06-11 QA-LIGN: Aligning LLMs through Constitutionally Decomposed QA2025-06-09 Explicit Preference Optimization: No Need for an Implicit Reward Model2025-06-09 LeVo: High-Quality Song Generation with Multi-Preference Alignment2025-06-09 LeanPO: Lean Preference Optimization for Likelihood Alignment in Video-LLMs2025-06-05 Aligning Large Language Models with Implicit Preferences from User-Generated Content2025-06-04 SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models2025-06-04 Understanding the Impact of Sampling Quality in Direct Preference Optimization2025-06-03 Protein Inverse Folding From Structure Feedback2025-06-03 Smoothed Preference Optimization via ReNoise Inversion for Aligning Diffusion Models with Varied Human Preferences2025-06-03 IF-GUIDE: Influence Function-Guided Detoxification of LLMs2025-06-02 Generalizable LLM Learning of Graph Synthetic Data with Reinforcement Learning2025-06-01