CodeT5+: Open Code Large Language Models for Code Understanding and Generation

Yue Wang, Hung Le, Akhilesh Deepak Gotmare, Nghi D. Q. Bui, Junnan Li, Steven C. H. Hoi

2023-05-13Math Code Completion Code Summarization Arithmetic Reasoning Code Search Code Generation HumanEval

Abstract

Large language models (LLMs) pretrained on vast source code have achieved prominent progress in code intelligence. However, existing code LLMs have two main limitations in terms of architecture and pretraining tasks. First, they often adopt a specific architecture (encoder-only or decoder-only) or rely on a unified encoder-decoder network for different downstream tasks. The former paradigm is limited by inflexibility in applications while in the latter, the model is treated as a single system for all tasks, leading to suboptimal performance on a subset of tasks. Secondly, they often employ a limited set of pretraining objectives which might not be relevant to some downstream tasks and hence result in substantial performance degrade. To address these limitations, we propose ``CodeT5+'', a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit a wide range of downstream code tasks. Such flexibility is enabled by our proposed mixture of pretraining objectives to mitigate the pretrain-finetune discrepancy. These objectives cover span denoising, contrastive learning, text-code matching, and causal LM pretraining tasks, on both unimodal and bimodal multilingual code corpora. Furthermore, we propose to initialize CodeT5+ with frozen off-the-shelf LLMs without training from scratch to efficiently scale up our models, and explore instruction-tuning to align with natural language instructions. We extensively evaluate CodeT5+ on over 20 code-related benchmarks in different settings, including zero-shot, finetuning, and instruction-tuning. We observe state-of-the-art (SoTA) model performance on various code-related tasks, such as code generation and completion, math programming, and text-to-code retrieval tasks. Particularly, our instruction-tuned CodeT5+ 16B achieves new SoTA results on HumanEval code generation task against other open code LLMs.

Results

Task	Dataset	Metric	Value	Model
Code Search	CodeXGLUE - AdvTest	MRR	44.7	CodeT5+ 770M
Code Search	CodeXGLUE - AdvTest	MRR	43.3	CodeT5+ 220M
Code Search	CodeSearchNet	Go	92.7	CodeT5+ 770M
Code Search	CodeSearchNet	JS	71.3	CodeT5+ 770M
Code Search	CodeSearchNet	Java	76.2	CodeT5+ 770M
Code Search	CodeSearchNet	Overall	77.4	CodeT5+ 770M
Code Search	CodeSearchNet	PHP	70.1	CodeT5+ 770M
Code Search	CodeSearchNet	Python	75.8	CodeT5+ 770M
Code Search	CodeSearchNet	Ruby	78	CodeT5+ 770M
Code Search	CodeSearchNet	Go	92.4	CodeT5+ 220M
Code Search	CodeSearchNet	Java	76.1	CodeT5+ 220M
Code Search	CodeSearchNet	Overall	77.1	CodeT5+ 220M
Code Search	CodeSearchNet	PHP	69.8	CodeT5+ 220M
Code Search	CodeSearchNet	Python	75.6	CodeT5+ 220M
Code Search	CodeSearchNet	Ruby	77.7	CodeT5+ 220M
Arithmetic Reasoning	GSM8K	Accuracy	73.8	CodeT5+
Arithmetic Reasoning	GSM8K	Parameters (Billion)	0.77	CodeT5+
Code Completion	CodeXGLUE - Github Java Corpus	EM (line-level)	37.9	CodeT5+ 770M
Code Completion	CodeXGLUE - Github Java Corpus	Edit Sim (line-level)	72.25	CodeT5+ 770M
Code Completion	CodeXGLUE - Github Java Corpus	EM (line-level)	35.17	CodeT5+ 220M
Code Completion	CodeXGLUE - Github Java Corpus	Edit Sim (line-level)	69.48	CodeT5+ 220M
Code Completion	CodeXGLUE - PY150	EM (line-level)	44.86	CodeT5+ 770M
Code Completion	CodeXGLUE - PY150	Edit Sim (line-level)	74.22	CodeT5+ 770M
Code Completion	CodeXGLUE - PY150	EM (line-level)	43.42	CodeT5+ 220M
Code Completion	CodeXGLUE - PY150	Edit Sim (line-level)	73.69	CodeT5+ 220M

CodeT5+: Open Code Large Language Models for Code Understanding and Generation

Abstract

Results

Related Papers

CodeT5+: Open Code Large Language Models for Code Understanding and Generation

Abstract

Results

Related Papers