MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts

Dengchun Li, Yingzi Ma, Naizheng Wang, Zhengmao Ye, Zhiyuan Cheng, Yinghao Tang, Yan Zhang, Lei Duan, Jie Zuo, Cal Yang, Mingjie Tang

2024-04-22Text Classification Question Answering Sentence Completion Quantization Common Sense Reasoning Multi-Task Learning

Paper PDF Code(official)Code(official)

Abstract

Fine-tuning Large Language Models (LLMs) is a common practice to adapt pre-trained models for specific applications. While methods like LoRA have effectively addressed GPU memory constraints during fine-tuning, their performance often falls short, especially in multi-task scenarios. In contrast, Mixture-of-Expert (MoE) models, such as Mixtral 8x7B, demonstrate remarkable performance in multi-task learning scenarios while maintaining a reduced parameter count. However, the resource requirements of these MoEs remain challenging, particularly for consumer-grade GPUs with less than 24GB memory. To tackle these challenges, we propose MixLoRA, an approach to construct a resource-efficient sparse MoE model based on LoRA. MixLoRA inserts multiple LoRA-based experts within the feed-forward network block of a frozen pre-trained dense model and employs a commonly used top-k router. Unlike other LoRA-based MoE methods, MixLoRA enhances model performance by utilizing independent attention-layer LoRA adapters. Additionally, an auxiliary load balance loss is employed to address the imbalance problem of the router. Our evaluations show that MixLoRA improves about 9% accuracy compared to state-of-the-art PEFT methods in multi-task learning scenarios. We also propose a new high-throughput framework to alleviate the computation and memory bottlenecks during the training and inference of MOE models. This framework reduces GPU memory consumption by 40% and token computation latency by 30% during both training and inference.

Results

Task	Dataset	Metric	Value	Model
Question Answering	SIQA	Accuracy	82.5	LLaMA-2 13B + MixLoRA
Question Answering	SIQA	Accuracy	78.8	LLaMA-3 8B + MixLoRA
Question Answering	SIQA	Accuracy	78	LLaMA-2 7B + MixLoRA
Question Answering	PIQA	Accuracy	87.6	LLaMA-3 8B + MixLoRA
Question Answering	PIQA	Accuracy	86.8	LLaMA-2 13B + MixLoRA
Question Answering	PIQA	Accuracy	83.2	LLaMA-2 7B + MixLoRA
Question Answering	BoolQ	Accuracy	77.1	LLaMA-2 13B + MixLoRA
Question Answering	BoolQ	Accuracy	75	LLaMA-3 8B + MixLoRA
Question Answering	BoolQ	Accuracy	72.7	LLaMA-2 7B + MixLoRA
Question Answering	OpenBookQA	Accuracy	84.8	LLaMA-3 8B + MixLoRA
Question Answering	OpenBookQA	Accuracy	83	LLaMA-2 13B + MixLoRA
Question Answering	OpenBookQA	Accuracy	81.6	LLaMA-2 7B + MixLoRA
Common Sense Reasoning	WinoGrande	Accuracy	86.3	LLaMA-2 13B + MixLoRA
Common Sense Reasoning	WinoGrande	Accuracy	82.1	LLaMA-3 8B + MixLoRA
Common Sense Reasoning	WinoGrande	Accuracy	76.8	LLaMA-2 7B + MixLoRA
Common Sense Reasoning	ARC (Challenge)	Accuracy	79.9	LLaMA-3 8B + MixLoRA
Common Sense Reasoning	ARC (Challenge)	Accuracy	69.9	LLaMA-2 13B + MixLoRA
Common Sense Reasoning	ARC (Challenge)	Accuracy	58.1	LLaMA-2 7B + MixLoRA
Common Sense Reasoning	ARC (Easy)	Accuracy	86.5	LLaMA-3 8B + MixLoRA
Common Sense Reasoning	ARC (Easy)	Accuracy	83.5	LLaMA-2 13B + MixLoRA
Common Sense Reasoning	ARC (Easy)	Accuracy	77.7	LLaMA-2 7B + MixLoRA
Sentence Completion	HellaSwag	Accuracy	94.7	LLaMA-2 13B + MixLoRA
Sentence Completion	HellaSwag	Accuracy	93.3	LLaMA-3 8B + MixLoRA
Sentence Completion	HellaSwag	Accuracy	93.1	LLaMA-2 7B + MixLoRA

MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts

Abstract

Results

Related Papers

MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts

Abstract

Results

Related Papers