TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/CodeBERT: A Pre-Trained Model for Programming and Natural ...

CodeBERT: A Pre-Trained Model for Programming and Natural Languages

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, Ming Zhou

2020-02-19Findings of the Association for Computational Linguistics 2020Code Documentation Generation
PaperPDFCodeCodeCode(official)CodeCodeCodeCodeCodeCode

Abstract

We present CodeBERT, a bimodal pre-trained model for programming language (PL) and nat-ural language (NL). CodeBERT learns general-purpose representations that support downstream NL-PL applications such as natural language codesearch, code documentation generation, etc. We develop CodeBERT with Transformer-based neural architecture, and train it with a hybrid objective function that incorporates the pre-training task of replaced token detection, which is to detect plausible alternatives sampled from generators. This enables us to utilize both bimodal data of NL-PL pairs and unimodal data, where the former provides input tokens for model training while the latter helps to learn better generators. We evaluate CodeBERT on two NL-PL applications by fine-tuning model parameters. Results show that CodeBERT achieves state-of-the-art performance on both natural language code search and code documentation generation tasks. Furthermore, to investigate what type of knowledge is learned in CodeBERT, we construct a dataset for NL-PL probing, and evaluate in a zero-shot setting where parameters of pre-trained models are fixed. Results show that CodeBERT performs better than previous pre-trained models on NL-PL probing.

Results

TaskDatasetMetricValueModel
Text GenerationCodeSearchNetSmoothed BLEU-415.99CodeBERT (MLM+RTD)
Text GenerationCodeSearchNetSmoothed BLEU-415.55CodeBERT (MLM)
Text GenerationCodeSearchNetSmoothed BLEU-415.15pre-train w/ code only
Text GenerationCodeSearchNetSmoothed BLEU-415.03CodeBERT (RTD)
Text GenerationCodeSearchNetSmoothed BLEU-414.52RoBERTa
Text GenerationCodeSearchNetSmoothed BLEU-414.31Transformer
Text GenerationCodeSearchNetSmoothed BLEU-413.36seq2seq
Text GenerationCodeSearchNet - PythonSmoothed BLEU-415.48CodeBERT (MLM)
Text GenerationCodeSearchNet - PythonSmoothed BLEU-415.41CodeBERT (MLM+RTD)
Text GenerationCodeSearchNet - PythonSmoothed BLEU-415.12pre-train w/ code only
Text GenerationCodeSearchNet - PythonSmoothed BLEU-414.92RoBERTa
Text GenerationCodeSearchNet - PythonSmoothed BLEU-413.44Transformer
Text GenerationCodeSearchNet - PythonSmoothed BLEU-413.04seq2seq
Text GenerationCodeSearchNet - GoSmoothed BLEU-426.79CodeBERT (MLM)
Text GenerationCodeSearchNet - GoSmoothed BLEU-426.66CodeBERT (MLM+RTD)
Text GenerationCodeSearchNet - GoSmoothed BLEU-426.39pre-train w/ code only
Text GenerationCodeSearchNet - GoSmoothed BLEU-426.09RoBERTa
Text GenerationCodeSearchNet - GoSmoothed BLEU-426.02CodeBERT (RTD)
Text GenerationCodeSearchNet - GoSmoothed BLEU-423.48seq2seq
Text GenerationCodeSearchNet - JavaScriptSmoothed BLEU-425.61Transformer
Text GenerationCodeSearchNet - JavaScriptSmoothed BLEU-49.54CodeBERT (MLM+RTD)
Text GenerationCodeSearchNet - JavaScriptSmoothed BLEU-48.73CodeBERT (RTD)
Text GenerationCodeSearchNet - JavaScriptSmoothed BLEU-48.51CodeBERT (MLM)
Text GenerationCodeSearchNet - JavaScriptSmoothed BLEU-48.3pre-train w/ code only
Text GenerationCodeSearchNet - JavaScriptSmoothed BLEU-46.88seq2seq
Text GenerationCodeSearchNet - JavaScriptSmoothed BLEU-45.72RoBERTa
Text GenerationCodeSearchNet - PhpSmoothed BLEU-421.32CodeBERT (MLM+RTD)
Text GenerationCodeSearchNet - PhpSmoothed BLEU-421CodeBERT (MLM)
Text GenerationCodeSearchNet - PhpSmoothed BLEU-420.71pre-train w/ code only
Text GenerationCodeSearchNet - PhpSmoothed BLEU-420.25CodeBERT (RTD)
Text GenerationCodeSearchNet - PhpSmoothed BLEU-419.9RoBERTa
Text GenerationCodeSearchNet - PhpSmoothed BLEU-418.4seq2seq
Text GenerationCodeSearchNet - PhpSmoothed BLEU-418.25Transformer
Text GenerationCodeSearchNet - JavaSmoothed BLEU-414.56CodeBERT (MLM+RTD)
Text GenerationCodeSearchNet - JavaSmoothed BLEU-413.59CodeBERT (MLM)
Text GenerationCodeSearchNet - JavaSmoothed BLEU-413.2RoBERTa
Text GenerationCodeSearchNet - JavaSmoothed BLEU-413.07pre-train w/ code only
Text GenerationCodeSearchNet - JavaSmoothed BLEU-412.72CodeBERT (RTD)
Text GenerationCodeSearchNet - JavaSmoothed BLEU-412.57Transformer
Text GenerationCodeSearchNet - JavaSmoothed BLEU-411.42seq2seq
Text GenerationCodeSearchNet - RubySmoothed BLEU-48.46CodeBERT (MLM+RTD)
Text GenerationCodeSearchNet - RubySmoothed BLEU-47.95CodeBERT (MLM)
Text GenerationCodeSearchNet - RubySmoothed BLEU-47.87Transformer
Text GenerationCodeSearchNet - RubySmoothed BLEU-47.36pre-train w/ code only
Text GenerationCodeSearchNet - RubySmoothed BLEU-47.26RoBERTa
Text GenerationCodeSearchNet - RubySmoothed BLEU-46.96seq2seq
Code GenerationCodeSearchNetSmoothed BLEU-415.99CodeBERT (MLM+RTD)
Code GenerationCodeSearchNetSmoothed BLEU-415.55CodeBERT (MLM)
Code GenerationCodeSearchNetSmoothed BLEU-415.15pre-train w/ code only
Code GenerationCodeSearchNetSmoothed BLEU-415.03CodeBERT (RTD)
Code GenerationCodeSearchNetSmoothed BLEU-414.52RoBERTa
Code GenerationCodeSearchNetSmoothed BLEU-414.31Transformer
Code GenerationCodeSearchNetSmoothed BLEU-413.36seq2seq
Code GenerationCodeSearchNet - PythonSmoothed BLEU-415.48CodeBERT (MLM)
Code GenerationCodeSearchNet - PythonSmoothed BLEU-415.41CodeBERT (MLM+RTD)
Code GenerationCodeSearchNet - PythonSmoothed BLEU-415.12pre-train w/ code only
Code GenerationCodeSearchNet - PythonSmoothed BLEU-414.92RoBERTa
Code GenerationCodeSearchNet - PythonSmoothed BLEU-413.44Transformer
Code GenerationCodeSearchNet - PythonSmoothed BLEU-413.04seq2seq
Code GenerationCodeSearchNet - GoSmoothed BLEU-426.79CodeBERT (MLM)
Code GenerationCodeSearchNet - GoSmoothed BLEU-426.66CodeBERT (MLM+RTD)
Code GenerationCodeSearchNet - GoSmoothed BLEU-426.39pre-train w/ code only
Code GenerationCodeSearchNet - GoSmoothed BLEU-426.09RoBERTa
Code GenerationCodeSearchNet - GoSmoothed BLEU-426.02CodeBERT (RTD)
Code GenerationCodeSearchNet - GoSmoothed BLEU-423.48seq2seq
Code GenerationCodeSearchNet - JavaScriptSmoothed BLEU-425.61Transformer
Code GenerationCodeSearchNet - JavaScriptSmoothed BLEU-49.54CodeBERT (MLM+RTD)
Code GenerationCodeSearchNet - JavaScriptSmoothed BLEU-48.73CodeBERT (RTD)
Code GenerationCodeSearchNet - JavaScriptSmoothed BLEU-48.51CodeBERT (MLM)
Code GenerationCodeSearchNet - JavaScriptSmoothed BLEU-48.3pre-train w/ code only
Code GenerationCodeSearchNet - JavaScriptSmoothed BLEU-46.88seq2seq
Code GenerationCodeSearchNet - JavaScriptSmoothed BLEU-45.72RoBERTa
Code GenerationCodeSearchNet - PhpSmoothed BLEU-421.32CodeBERT (MLM+RTD)
Code GenerationCodeSearchNet - PhpSmoothed BLEU-421CodeBERT (MLM)
Code GenerationCodeSearchNet - PhpSmoothed BLEU-420.71pre-train w/ code only
Code GenerationCodeSearchNet - PhpSmoothed BLEU-420.25CodeBERT (RTD)
Code GenerationCodeSearchNet - PhpSmoothed BLEU-419.9RoBERTa
Code GenerationCodeSearchNet - PhpSmoothed BLEU-418.4seq2seq
Code GenerationCodeSearchNet - PhpSmoothed BLEU-418.25Transformer
Code GenerationCodeSearchNet - JavaSmoothed BLEU-414.56CodeBERT (MLM+RTD)
Code GenerationCodeSearchNet - JavaSmoothed BLEU-413.59CodeBERT (MLM)
Code GenerationCodeSearchNet - JavaSmoothed BLEU-413.2RoBERTa
Code GenerationCodeSearchNet - JavaSmoothed BLEU-413.07pre-train w/ code only
Code GenerationCodeSearchNet - JavaSmoothed BLEU-412.72CodeBERT (RTD)
Code GenerationCodeSearchNet - JavaSmoothed BLEU-412.57Transformer
Code GenerationCodeSearchNet - JavaSmoothed BLEU-411.42seq2seq
Code GenerationCodeSearchNet - RubySmoothed BLEU-48.46CodeBERT (MLM+RTD)
Code GenerationCodeSearchNet - RubySmoothed BLEU-47.95CodeBERT (MLM)
Code GenerationCodeSearchNet - RubySmoothed BLEU-47.87Transformer
Code GenerationCodeSearchNet - RubySmoothed BLEU-47.36pre-train w/ code only
Code GenerationCodeSearchNet - RubySmoothed BLEU-47.26RoBERTa
Code GenerationCodeSearchNet - RubySmoothed BLEU-46.96seq2seq
Program SynthesisManyTypes4TypeScriptAverage Accuracy61.72CodeBERT
Program SynthesisManyTypes4TypeScriptAverage F159.57CodeBERT
Program SynthesisManyTypes4TypeScriptAverage Precision59.34CodeBERT
Program SynthesisManyTypes4TypeScriptAverage Recall59.8CodeBERT
Code SearchCodeSearchNetGo69.3CodeBERT
Code SearchCodeSearchNetJS74.8CodeBERT
Code SearchCodeSearchNetJava86.8CodeBERT
Code SearchCodeSearchNetOverall76CodeBERT
Code SearchCodeSearchNetPHP70.6CodeBERT
Code SearchCodeSearchNetPython84CodeBERT
Code SearchCodeSearchNetRuby70.6CodeBERT
Type predictionManyTypes4TypeScriptAverage Accuracy61.72CodeBERT
Type predictionManyTypes4TypeScriptAverage F159.57CodeBERT
Type predictionManyTypes4TypeScriptAverage Precision59.34CodeBERT
Type predictionManyTypes4TypeScriptAverage Recall59.8CodeBERT
Code Documentation GenerationCodeSearchNetSmoothed BLEU-415.99CodeBERT (MLM+RTD)
Code Documentation GenerationCodeSearchNetSmoothed BLEU-415.55CodeBERT (MLM)
Code Documentation GenerationCodeSearchNetSmoothed BLEU-415.15pre-train w/ code only
Code Documentation GenerationCodeSearchNetSmoothed BLEU-415.03CodeBERT (RTD)
Code Documentation GenerationCodeSearchNetSmoothed BLEU-414.52RoBERTa
Code Documentation GenerationCodeSearchNetSmoothed BLEU-414.31Transformer
Code Documentation GenerationCodeSearchNetSmoothed BLEU-413.36seq2seq
Code Documentation GenerationCodeSearchNet - PythonSmoothed BLEU-415.48CodeBERT (MLM)
Code Documentation GenerationCodeSearchNet - PythonSmoothed BLEU-415.41CodeBERT (MLM+RTD)
Code Documentation GenerationCodeSearchNet - PythonSmoothed BLEU-415.12pre-train w/ code only
Code Documentation GenerationCodeSearchNet - PythonSmoothed BLEU-414.92RoBERTa
Code Documentation GenerationCodeSearchNet - PythonSmoothed BLEU-413.44Transformer
Code Documentation GenerationCodeSearchNet - PythonSmoothed BLEU-413.04seq2seq
Code Documentation GenerationCodeSearchNet - GoSmoothed BLEU-426.79CodeBERT (MLM)
Code Documentation GenerationCodeSearchNet - GoSmoothed BLEU-426.66CodeBERT (MLM+RTD)
Code Documentation GenerationCodeSearchNet - GoSmoothed BLEU-426.39pre-train w/ code only
Code Documentation GenerationCodeSearchNet - GoSmoothed BLEU-426.09RoBERTa
Code Documentation GenerationCodeSearchNet - GoSmoothed BLEU-426.02CodeBERT (RTD)
Code Documentation GenerationCodeSearchNet - GoSmoothed BLEU-423.48seq2seq
Code Documentation GenerationCodeSearchNet - JavaScriptSmoothed BLEU-425.61Transformer
Code Documentation GenerationCodeSearchNet - JavaScriptSmoothed BLEU-49.54CodeBERT (MLM+RTD)
Code Documentation GenerationCodeSearchNet - JavaScriptSmoothed BLEU-48.73CodeBERT (RTD)
Code Documentation GenerationCodeSearchNet - JavaScriptSmoothed BLEU-48.51CodeBERT (MLM)
Code Documentation GenerationCodeSearchNet - JavaScriptSmoothed BLEU-48.3pre-train w/ code only
Code Documentation GenerationCodeSearchNet - JavaScriptSmoothed BLEU-46.88seq2seq
Code Documentation GenerationCodeSearchNet - JavaScriptSmoothed BLEU-45.72RoBERTa
Code Documentation GenerationCodeSearchNet - PhpSmoothed BLEU-421.32CodeBERT (MLM+RTD)
Code Documentation GenerationCodeSearchNet - PhpSmoothed BLEU-421CodeBERT (MLM)
Code Documentation GenerationCodeSearchNet - PhpSmoothed BLEU-420.71pre-train w/ code only
Code Documentation GenerationCodeSearchNet - PhpSmoothed BLEU-420.25CodeBERT (RTD)
Code Documentation GenerationCodeSearchNet - PhpSmoothed BLEU-419.9RoBERTa
Code Documentation GenerationCodeSearchNet - PhpSmoothed BLEU-418.4seq2seq
Code Documentation GenerationCodeSearchNet - PhpSmoothed BLEU-418.25Transformer
Code Documentation GenerationCodeSearchNet - JavaSmoothed BLEU-414.56CodeBERT (MLM+RTD)
Code Documentation GenerationCodeSearchNet - JavaSmoothed BLEU-413.59CodeBERT (MLM)
Code Documentation GenerationCodeSearchNet - JavaSmoothed BLEU-413.2RoBERTa
Code Documentation GenerationCodeSearchNet - JavaSmoothed BLEU-413.07pre-train w/ code only
Code Documentation GenerationCodeSearchNet - JavaSmoothed BLEU-412.72CodeBERT (RTD)
Code Documentation GenerationCodeSearchNet - JavaSmoothed BLEU-412.57Transformer
Code Documentation GenerationCodeSearchNet - JavaSmoothed BLEU-411.42seq2seq
Code Documentation GenerationCodeSearchNet - RubySmoothed BLEU-48.46CodeBERT (MLM+RTD)
Code Documentation GenerationCodeSearchNet - RubySmoothed BLEU-47.95CodeBERT (MLM)
Code Documentation GenerationCodeSearchNet - RubySmoothed BLEU-47.87Transformer
Code Documentation GenerationCodeSearchNet - RubySmoothed BLEU-47.36pre-train w/ code only
Code Documentation GenerationCodeSearchNet - RubySmoothed BLEU-47.26RoBERTa
Code Documentation GenerationCodeSearchNet - RubySmoothed BLEU-46.96seq2seq

Related Papers

DocAgent: A Multi-Agent System for Automated Code Documentation Generation2025-04-11RepoAgent: An LLM-Powered Open-Source Framework for Repository-level Code Documentation Generation2024-02-26A Comparative Analysis of Large Language Models for Code Documentation Generation2023-12-16Assemble Foundation Models for Automatic Code Summarization2022-01-13Memorization and Generalization in Neural Code Intelligence Models2021-06-16CodeTrans: Towards Cracking the Language of Silicon's Code Through Self-Supervised Deep Learning and High Performance Computing2021-04-06HAConvGNN: Hierarchical Attention Based Convolutional Graph Neural Network for Code Documentation Generation in Jupyter Notebooks2021-03-31