DAPO
Dialogue-Adaptive Pre-training Objective
Description
Dialogue-Adaptive Pre-training Objective (DAPO) is a pre-training objective for dialogue adaptation, which is designed to measure qualities of dialogues from multiple important aspects, like Readability, Consistency and Fluency which have already been focused on by general LM pre-training objectives, and those also significant for assessing dialogues but ignored by general LM pre-training objectives, like Diversity and Specificity.
Papers Using This Method
MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent2025-07-03CodeV-R1: Reasoning-Enhanced Verilog Generation2025-05-30CoThink: Token-Efficient Reasoning via Instruct Models Guiding Reasoning Models2025-05-28MASKSEARCH: A Universal Pre-Training Framework to Enhance Agentic Search Capability2025-05-26Rethinking the Sampling Criteria in Reinforcement Learning for LLM Reasoning: A Competence-Difficulty Alignment Perspective2025-05-23KTAE: A Model-Free Algorithm to Key-Tokens Advantage Estimation in Mathematical Reasoning2025-05-22DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization2025-05-18VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks2025-04-07Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage Policy Optimization2024-12-24Dual Approximation Policy Optimization2024-10-02Dialogue-adaptive Language Model Pre-training From Quality Estimation2020-09-10