Guillaume Sanchez, Honglu Fan, Alexander Spangher, Elad Levi, Pawan Sasanka Ammanamanchi, Stella Biderman
Classifier-Free Guidance (CFG) has recently emerged in text-to-image generation as a lightweight technique to encourage prompt-adherence in generations. In this work, we demonstrate that CFG can be used broadly as an inference-time technique in pure language modeling. We show that CFG (1) improves the performance of Pythia, GPT-2 and LLaMA-family models across an array of tasks: Q\&A, reasoning, code generation, and machine translation, achieving SOTA on LAMBADA with LLaMA-7B over PaLM-540B; (2) brings improvements equivalent to a model with twice the parameter-count; (3) can stack alongside other inference-time methods like Chain-of-Thought and Self-Consistency, yielding further improvements in difficult tasks; (4) can be used to increase the faithfulness and coherence of assistants in challenging form-driven and content-driven prompts: in a human evaluation we show a 75\% preference for GPT4All using CFG over baseline.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Text Generation | SciQ | Accuracy | 96.6 | LLaMA-65B+CFG (zero-shot) |
| Text Generation | SciQ | Accuracy | 96.4 | LLaMA-30B+CFG (zero-shot) |
| Text Generation | SciQ | Accuracy | 95.1 | LLaMA-13B+CFG (zero-shot) |
| Common Sense Reasoning | ARC (Easy) | Accuracy | 84.2 | LLaMA 65B + CFG (0-shot) |
| Common Sense Reasoning | ARC (Easy) | Accuracy | 83.2 | LLaMA 30B + CFG (0-shot) |
| Common Sense Reasoning | ARC (Easy) | Accuracy | 79.1 | LLaMA 13B + CFG (0-shot) |
| Common Sense Reasoning | ARC (Easy) | Accuracy | 58.9 | LLaMA 7B + CFG (0-shot) |
| Language Modelling | LAMBADA | Accuracy | 84 | LLaMA-65B+CFG (Zero-Shot) |
| Language Modelling | LAMBADA | Accuracy | 83.9 | LLaMA-30B+CFG (zero-shot) |
| Language Modelling | LAMBADA | Accuracy | 82.2 | LLaMA-13B+CFG (zero-shot) |
| Sentence Completion | HellaSwag | Accuracy | 86.3 | LLaMA 65B + CFG (0-shot) |
| Sentence Completion | HellaSwag | Accuracy | 85.3 | LLaMA 30B + CFG (0-shot) |
| Sentence Completion | HellaSwag | Accuracy | 82.1 | LLaMA 13B + CFG (0-shot) |