TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Parameter-Efficient Sparsity Crafting from Dense to Mixtur...

Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks

Haoyuan Wu, Haisheng Zheng, Zhuolun He, Bei Yu

2024-01-05Question AnsweringMath Word Problem SolvingMulti-task Language UnderstandingSentence CompletionCommon Sense ReasoningArithmetic ReasoningCode Generation
PaperPDFCode(official)Code

Abstract

Large language models (LLMs) have demonstrated considerable proficiency in general natural language processing (NLP) tasks. Instruction tuning, a successful paradigm, enhances the ability of LLMs to follow natural language instructions and exhibit robust generalization across general tasks. However, these models often encounter performance limitations across multiple tasks due to constrained model capacity. Expanding this capacity during the instruction tuning phase poses significant challenges. To address this issue, we introduce parameter-efficient sparsity crafting (PESC), which crafts dense models into sparse models using the mixture-of-experts (MoE) architecture. PESC integrates adapters into the MoE layers of sparse models, differentiating experts without altering the individual weights within these layers. This method significantly reduces computational costs and GPU memory requirements, facilitating model capacity expansion through a minimal parameter increase when guaranteeing the quality of approximation in function space compared to original sparse upcycling. Our empirical evaluation demonstrates the effectiveness of the PESC method. Using PESC during instruction tuning, our best sparse model outperforms other sparse and dense models and exhibits superior general capabilities compared to GPT-3.5. Our code is available at https://github.com/wuhy68/Parameter-Efficient-MoE.

Results

TaskDatasetMetricValueModel
Question AnsweringPIQAAccuracy82.7Camelidae-8×34B
Question AnsweringMATHAccuracy29.9Qwen2idae-16x14B (4-shot)
Question AnsweringMATHAccuracy22.6Camelidae-8×34B (4-shot)
Code GenerationMBPPAccuracy48.6Qwen2idae-16x14B (4-shot)
Code GenerationMBPPAccuracy41.4Camelidae-8×34B (4-shot)
Common Sense ReasoningWinoGrandeAccuracy80.9Camelidae-8×34B
Common Sense ReasoningARC (Challenge)Accuracy65.2Camelidae-8×34B
Common Sense ReasoningARC (Easy)Accuracy86.2Camelidae-8×34B
Math Word Problem SolvingMATHAccuracy29.9Qwen2idae-16x14B (4-shot)
Math Word Problem SolvingMATHAccuracy22.6Camelidae-8×34B (4-shot)
Mathematical Question AnsweringMATHAccuracy29.9Qwen2idae-16x14B (4-shot)
Mathematical Question AnsweringMATHAccuracy22.6Camelidae-8×34B (4-shot)
Mathematical ReasoningMATHAccuracy29.9Qwen2idae-16x14B (4-shot)
Mathematical ReasoningMATHAccuracy22.6Camelidae-8×34B (4-shot)
Sentence CompletionHellaSwagAccuracy83.2Camelidae-8×34B (10-shot)
Sentence CompletionHellaSwagAccuracy82.3Qwen2idae-16x14B (10-shot)
Arithmetic ReasoningGSM8KAccuracy78.3Camelidae-8×34B (5-shot)
Arithmetic ReasoningGSM8KAccuracy77.8Qwen2idae-16x14B (5-shot)

Related Papers

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning2025-07-18From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes2025-07-17Towards Formal Verification of LLM-Generated Code from Natural Language Prompts2025-07-17Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16