TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/OpenChat: Advancing Open-source Language Models with Mixed...

OpenChat: Advancing Open-source Language Models with Mixed-Quality Data

Guan Wang, Sijie Cheng, Xianyuan Zhan, Xiangang Li, Sen Song, Yang Liu

2023-09-20Math Word Problem SolvingArithmetic ReasoningCode Generation
PaperPDFCode(official)

Abstract

Nowadays, open-source large language models like LLaMA have emerged. Recent developments have incorporated supervised fine-tuning (SFT) and reinforcement learning fine-tuning (RLFT) to align these models with human goals. However, SFT methods treat all training data with mixed quality equally, while RLFT methods require high-quality pairwise or ranking-based preference data. In this study, we present a novel framework, named OpenChat, to advance open-source language models with mixed-quality data. Specifically, we consider the general SFT training data, consisting of a small amount of expert data mixed with a large proportion of sub-optimal data, without any preference labels. We propose the C(onditioned)-RLFT, which regards different data sources as coarse-grained reward labels and learns a class-conditioned policy to leverage complementary data quality information. Interestingly, the optimal policy in C-RLFT can be easily solved through single-stage, RL-free supervised learning, which is lightweight and avoids costly human preference labeling. Through extensive experiments on three standard benchmarks, our openchat-13b fine-tuned with C-RLFT achieves the highest average performance among all 13b open-source language models. Moreover, we use AGIEval to validate the model generalization performance, in which only openchat-13b surpasses the base model. Finally, we conduct a series of analyses to shed light on the effectiveness and robustness of OpenChat. Our code, data, and models are publicly available at https://github.com/imoneoi/openchat and https://huggingface.co/openchat.

Results

TaskDatasetMetricValueModel
Question AnsweringMATHAccuracy28.9OpenChat-3.5-1210 7B
Question AnsweringMATHParameters (Billions)7OpenChat-3.5-1210 7B
Question AnsweringMATHAccuracy28.6OpenChat-3.5 7B
Question AnsweringMATHParameters (Billions)7OpenChat-3.5 7B
Math Word Problem SolvingMATHAccuracy28.9OpenChat-3.5-1210 7B
Math Word Problem SolvingMATHParameters (Billions)7OpenChat-3.5-1210 7B
Math Word Problem SolvingMATHAccuracy28.6OpenChat-3.5 7B
Math Word Problem SolvingMATHParameters (Billions)7OpenChat-3.5 7B
Mathematical Question AnsweringMATHAccuracy28.9OpenChat-3.5-1210 7B
Mathematical Question AnsweringMATHParameters (Billions)7OpenChat-3.5-1210 7B
Mathematical Question AnsweringMATHAccuracy28.6OpenChat-3.5 7B
Mathematical Question AnsweringMATHParameters (Billions)7OpenChat-3.5 7B
Mathematical ReasoningMATHAccuracy28.9OpenChat-3.5-1210 7B
Mathematical ReasoningMATHParameters (Billions)7OpenChat-3.5-1210 7B
Mathematical ReasoningMATHAccuracy28.6OpenChat-3.5 7B
Mathematical ReasoningMATHParameters (Billions)7OpenChat-3.5 7B
Arithmetic ReasoningGSM8KAccuracy77.3OpenChat-3.5 7B
Arithmetic ReasoningGSM8KParameters (Billion)7OpenChat-3.5 7B

Related Papers

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning2025-07-18Towards Formal Verification of LLM-Generated Code from Natural Language Prompts2025-07-17MERA Code: A Unified Framework for Evaluating Code Generation Across Tasks2025-07-16Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training2025-07-16DCR: Quantifying Data Contamination in LLMs Evaluation2025-07-15The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs2025-07-15Kodezi Chronos: A Debugging-First Language Model for Repository-Scale, Memory-Driven Code Understanding2025-07-14CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks2025-07-14