TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/CodeT5+: Open Code Large Language Models for Code Understa...

CodeT5+: Open Code Large Language Models for Code Understanding and Generation

Yue Wang, Hung Le, Akhilesh Deepak Gotmare, Nghi D. Q. Bui, Junnan Li, Steven C. H. Hoi

2023-05-13MathCode CompletionCode SummarizationArithmetic ReasoningCode SearchCode GenerationHumanEval
PaperPDFCode(official)Code

Abstract

Large language models (LLMs) pretrained on vast source code have achieved prominent progress in code intelligence. However, existing code LLMs have two main limitations in terms of architecture and pretraining tasks. First, they often adopt a specific architecture (encoder-only or decoder-only) or rely on a unified encoder-decoder network for different downstream tasks. The former paradigm is limited by inflexibility in applications while in the latter, the model is treated as a single system for all tasks, leading to suboptimal performance on a subset of tasks. Secondly, they often employ a limited set of pretraining objectives which might not be relevant to some downstream tasks and hence result in substantial performance degrade. To address these limitations, we propose ``CodeT5+'', a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit a wide range of downstream code tasks. Such flexibility is enabled by our proposed mixture of pretraining objectives to mitigate the pretrain-finetune discrepancy. These objectives cover span denoising, contrastive learning, text-code matching, and causal LM pretraining tasks, on both unimodal and bimodal multilingual code corpora. Furthermore, we propose to initialize CodeT5+ with frozen off-the-shelf LLMs without training from scratch to efficiently scale up our models, and explore instruction-tuning to align with natural language instructions. We extensively evaluate CodeT5+ on over 20 code-related benchmarks in different settings, including zero-shot, finetuning, and instruction-tuning. We observe state-of-the-art (SoTA) model performance on various code-related tasks, such as code generation and completion, math programming, and text-to-code retrieval tasks. Particularly, our instruction-tuned CodeT5+ 16B achieves new SoTA results on HumanEval code generation task against other open code LLMs.

Results

TaskDatasetMetricValueModel
Code SearchCodeXGLUE - AdvTestMRR44.7CodeT5+ 770M
Code SearchCodeXGLUE - AdvTestMRR43.3CodeT5+ 220M
Code SearchCodeSearchNetGo92.7CodeT5+ 770M
Code SearchCodeSearchNetJS71.3CodeT5+ 770M
Code SearchCodeSearchNetJava76.2CodeT5+ 770M
Code SearchCodeSearchNetOverall77.4CodeT5+ 770M
Code SearchCodeSearchNetPHP70.1CodeT5+ 770M
Code SearchCodeSearchNetPython75.8CodeT5+ 770M
Code SearchCodeSearchNetRuby78CodeT5+ 770M
Code SearchCodeSearchNetGo92.4CodeT5+ 220M
Code SearchCodeSearchNetJava76.1CodeT5+ 220M
Code SearchCodeSearchNetOverall77.1CodeT5+ 220M
Code SearchCodeSearchNetPHP69.8CodeT5+ 220M
Code SearchCodeSearchNetPython75.6CodeT5+ 220M
Code SearchCodeSearchNetRuby77.7CodeT5+ 220M
Arithmetic ReasoningGSM8KAccuracy73.8CodeT5+
Arithmetic ReasoningGSM8KParameters (Billion)0.77CodeT5+
Code CompletionCodeXGLUE - Github Java CorpusEM (line-level)37.9CodeT5+ 770M
Code CompletionCodeXGLUE - Github Java CorpusEdit Sim (line-level)72.25CodeT5+ 770M
Code CompletionCodeXGLUE - Github Java CorpusEM (line-level)35.17CodeT5+ 220M
Code CompletionCodeXGLUE - Github Java CorpusEdit Sim (line-level)69.48CodeT5+ 220M
Code CompletionCodeXGLUE - PY150EM (line-level)44.86CodeT5+ 770M
Code CompletionCodeXGLUE - PY150Edit Sim (line-level)74.22CodeT5+ 770M
Code CompletionCodeXGLUE - PY150EM (line-level)43.42CodeT5+ 220M
Code CompletionCodeXGLUE - PY150Edit Sim (line-level)73.69CodeT5+ 220M

Related Papers

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning2025-07-18VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation2025-07-17Towards Formal Verification of LLM-Generated Code from Natural Language Prompts2025-07-17Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training2025-07-16MERA Code: A Unified Framework for Evaluating Code Generation Across Tasks2025-07-16Temperature and Persona Shape LLM Agent Consensus With Minimal Accuracy Gains in Qualitative Coding2025-07-15Personalized Exercise Recommendation with Semantically-Grounded Knowledge Tracing2025-07-15