Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SFT

Shrink and Fine-Tune

GeneralIntroduced 2000415 papers

Description

Shrink and Fine-Tune, or SFT, is a type of distillation that avoids explicit distillation by copying parameters to a student student model and then fine-tuning. Specifically it extracts a student model from the maximally spaced layers of a fine-tuned teacher. Each layer $l \in L'$ is copied fully from $\mathcal{L}\_{Data}$ . For example, when creating a BART student with 3 decoder layers from the 12 encoder layer 12 decoder layer teacher, we copy the teacher’s full $Enc^{L}$ and decoder layers 0, 6, and 11 to the student. When deciding which layers to copy, we break ties arbitrarily; copying layers 0, 5, and 11 might work just as well. When copy only 1 decoder layer, we copy layer 0. This was found this to work better than copying layer 11. The impact of initialization on performance is measured experimentally in Section 6.1. After initialization, the student model continues to fine-tune on the summarization dataset, with the objective of minimizing $\mathcal{L}\_{Data}$ .

Papers Using This Method

A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning2025-07-11 InvestAlign: Overcoming Data Scarcity in Aligning Large Language Models with Investor Decision-Making Processes under Herd Behavior2025-07-09 CogniSQL-R1-Zero: Lightweight Reinforced Reasoning for Efficient SQL Generation2025-07-08 AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs2025-07-08 Reinforcement Fine-Tuning Naturally Mitigates Forgetting in Continual Post-Training2025-07-07 SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning2025-06-30 Complexity-aware fine-tuning2025-06-26 LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning2025-06-23 Smart-LLaMA-DPO: Reinforced Large Language Model for Explainable Smart Contract Vulnerability Detection2025-06-23 RLRC: Reinforcement Learning-based Recovery for Compressed Vision-Language-Action Models2025-06-21 Differentiation-Based Extraction of Proprietary Data from Fine-Tuned LLMs2025-06-20 Refine-POI: Reinforcement Fine-Tuned Large Language Models for Next Point-of-Interest Recommendation2025-06-19 Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment Quality2025-06-17 AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy2025-06-16 Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning2025-06-16 Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning2025-06-16 QFFT, Question-Free Fine-Tuning for Adaptive Reasoning2025-06-15 VGR: Visual Grounded Reasoning2025-06-13 Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning2025-06-11 Eliciting Fine-Tuned Transformer Capabilities via Inference-Time Techniques2025-06-09