TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MixLoRA: Enhancing Large Language Models Fine-Tuning with ...

MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts

Dengchun Li, Yingzi Ma, Naizheng Wang, Zhengmao Ye, Zhiyuan Cheng, Yinghao Tang, Yan Zhang, Lei Duan, Jie Zuo, Cal Yang, Mingjie Tang

2024-04-22Text ClassificationQuestion AnsweringSentence CompletionQuantizationCommon Sense ReasoningMulti-Task Learning
PaperPDFCode(official)Code(official)

Abstract

Fine-tuning Large Language Models (LLMs) is a common practice to adapt pre-trained models for specific applications. While methods like LoRA have effectively addressed GPU memory constraints during fine-tuning, their performance often falls short, especially in multi-task scenarios. In contrast, Mixture-of-Expert (MoE) models, such as Mixtral 8x7B, demonstrate remarkable performance in multi-task learning scenarios while maintaining a reduced parameter count. However, the resource requirements of these MoEs remain challenging, particularly for consumer-grade GPUs with less than 24GB memory. To tackle these challenges, we propose MixLoRA, an approach to construct a resource-efficient sparse MoE model based on LoRA. MixLoRA inserts multiple LoRA-based experts within the feed-forward network block of a frozen pre-trained dense model and employs a commonly used top-k router. Unlike other LoRA-based MoE methods, MixLoRA enhances model performance by utilizing independent attention-layer LoRA adapters. Additionally, an auxiliary load balance loss is employed to address the imbalance problem of the router. Our evaluations show that MixLoRA improves about 9% accuracy compared to state-of-the-art PEFT methods in multi-task learning scenarios. We also propose a new high-throughput framework to alleviate the computation and memory bottlenecks during the training and inference of MOE models. This framework reduces GPU memory consumption by 40% and token computation latency by 30% during both training and inference.

Results

TaskDatasetMetricValueModel
Question AnsweringSIQAAccuracy82.5LLaMA-2 13B + MixLoRA
Question AnsweringSIQAAccuracy78.8LLaMA-3 8B + MixLoRA
Question AnsweringSIQAAccuracy78LLaMA-2 7B + MixLoRA
Question AnsweringPIQAAccuracy87.6LLaMA-3 8B + MixLoRA
Question AnsweringPIQAAccuracy86.8LLaMA-2 13B + MixLoRA
Question AnsweringPIQAAccuracy83.2LLaMA-2 7B + MixLoRA
Question AnsweringBoolQAccuracy77.1LLaMA-2 13B + MixLoRA
Question AnsweringBoolQAccuracy75LLaMA-3 8B + MixLoRA
Question AnsweringBoolQAccuracy72.7LLaMA-2 7B + MixLoRA
Question AnsweringOpenBookQAAccuracy84.8LLaMA-3 8B + MixLoRA
Question AnsweringOpenBookQAAccuracy83LLaMA-2 13B + MixLoRA
Question AnsweringOpenBookQAAccuracy81.6LLaMA-2 7B + MixLoRA
Common Sense ReasoningWinoGrandeAccuracy86.3LLaMA-2 13B + MixLoRA
Common Sense ReasoningWinoGrandeAccuracy82.1LLaMA-3 8B + MixLoRA
Common Sense ReasoningWinoGrandeAccuracy76.8LLaMA-2 7B + MixLoRA
Common Sense ReasoningARC (Challenge)Accuracy79.9LLaMA-3 8B + MixLoRA
Common Sense ReasoningARC (Challenge)Accuracy69.9LLaMA-2 13B + MixLoRA
Common Sense ReasoningARC (Challenge)Accuracy58.1LLaMA-2 7B + MixLoRA
Common Sense ReasoningARC (Easy)Accuracy86.5LLaMA-3 8B + MixLoRA
Common Sense ReasoningARC (Easy)Accuracy83.5LLaMA-2 13B + MixLoRA
Common Sense ReasoningARC (Easy)Accuracy77.7LLaMA-2 7B + MixLoRA
Sentence CompletionHellaSwagAccuracy94.7LLaMA-2 13B + MixLoRA
Sentence CompletionHellaSwagAccuracy93.3LLaMA-3 8B + MixLoRA
Sentence CompletionHellaSwagAccuracy93.1LLaMA-2 7B + MixLoRA

Related Papers

Efficient Deployment of Spiking Neural Networks on SpiNNaker2 for DVS Gesture Recognition Using Neuromorphic Intermediate Representation2025-09-04An End-to-End DNN Inference Framework for the SpiNNaker2 Neuromorphic MPSoC2025-07-18Making Language Model a Hierarchical Classifier and Generator2025-07-17From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17