TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/CodeXGLUE: A Machine Learning Benchmark Dataset for Code U...

CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

Shuai Lu, Daya Guo, Shuo Ren, JunJie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li, Lidong Zhou, Linjun Shou, Long Zhou, Michele Tufano, Ming Gong, Ming Zhou, Nan Duan, Neel Sundaresan, Shao Kun Deng, Shengyu Fu, Shujie Liu

2021-02-09Cloze TestText-to-Code GenerationCode TranslationCode CompletionDefect DetectionClone DetectionDocument TranslationCode SummarizationCode RepairCode SearchCode GenerationBIG-bench Machine Learning
PaperPDFCodeCodeCodeCodeCodeCode(official)Code

Abstract

Benchmark datasets have a significant impact on accelerating research in programming language tasks. In this paper, we introduce CodeXGLUE, a benchmark dataset to foster machine learning research for program understanding and generation. CodeXGLUE includes a collection of 10 tasks across 14 datasets and a platform for model evaluation and comparison. CodeXGLUE also features three baseline systems, including the BERT-style, GPT-style, and Encoder-Decoder models, to make it easy for researchers to use the platform. The availability of such data and baselines can help the development and validation of new methods that can be applied to various program understanding and generation problems.

Results

TaskDatasetMetricValueModel
Code GenerationCodeXGLUE - CodeTransAccuracy (C#→Java)58CodeBERT
Code GenerationCodeXGLUE - CodeTransAccuracy (Java→C#)59CodeBERT
Code GenerationCodeXGLUE - CodeTransBLEU (C#→Java)72.14CodeBERT
Code GenerationCodeXGLUE - CodeTransBLEU (Java→C#)79.92CodeBERT
Code GenerationCodeXGLUE - CodeTransCodeBLEU (C#→Java)79.41CodeBERT
Code GenerationCodeXGLUE - CodeTransCodeBLEU (Java→C#)85.1CodeBERT
Code SearchCodeXGLUE - AdvTestMRR27.19CodeBERT
Code SearchCodeXGLUE - WebQueryTestAccuracy47.8CodeBERT
Code SearchCodeXGLUE - WebQueryTestF158.95CodeBERT
Cloze TestCodeXGLUE - CT-allGo83.31CodeBERT(MLM)
Cloze TestCodeXGLUE - CT-allJS81.77CodeBERT(MLM)
Cloze TestCodeXGLUE - CT-allJava80.63CodeBERT(MLM)
Cloze TestCodeXGLUE - CT-allPHP85.05CodeBERT(MLM)
Cloze TestCodeXGLUE - CT-allPython87.21CodeBERT(MLM)
Cloze TestCodeXGLUE - CT-allRuby80.17CodeBERT(MLM)
Cloze TestCodeXGLUE - CT-maxminGo90.79CodeBERT(MLM)
Cloze TestCodeXGLUE - CT-maxminJS86.4CodeBERT(MLM)
Cloze TestCodeXGLUE - CT-maxminJava90.46CodeBERT(MLM)
Cloze TestCodeXGLUE - CT-maxminPHP88.21CodeBERT(MLM)
Cloze TestCodeXGLUE - CT-maxminPython82.2CodeBERT(MLM)
Cloze TestCodeXGLUE - CT-maxminRuby86.84CodeBERT(MLM)
Text-to-Code GenerationCodeXGLUE - CONCODEBLEU32.79CodeGPT-adapted
Text-to-Code GenerationCodeXGLUE - CONCODECodeBLEU27.74CodeGPT-adapted
Text-to-Code GenerationCodeXGLUE - CONCODEEM20.1CodeGPT-adapted
Code RepairCodeXGLUE - Bugs2FixAccuracy (medium)5.2CodeBERT
Code RepairCodeXGLUE - Bugs2FixAccuracy (small)16.4CodeBERT
Code RepairCodeXGLUE - Bugs2FixBLEU (medium)91.07CodeBERT
Code RepairCodeXGLUE - Bugs2FixBLEU (small)77.42CodeBERT
Code RepairCodeXGLUE - Bugs2FixCodeBLEU (medium)87.52CodeBERT
Code RepairCodeXGLUE - Bugs2FixCodeBLEU (small)75.58CodeBERT
Code CompletionCodeXGLUE - Github Java CorpusAccuracy (token-level)77.13CodeGPT-adapted
Code CompletionCodeXGLUE - Github Java CorpusEM (line-level)26.43CodeGPT-adapted
Code CompletionCodeXGLUE - Github Java CorpusEdit Sim (line-level)63.03CodeGPT-adapted
Code CompletionCodeXGLUE - PY150Accuracy (token-level)75.11CodeGPT-adapted
Code CompletionCodeXGLUE - PY150EM (line-level)39.65CodeGPT-adapted
Code CompletionCodeXGLUE - PY150Edit Sim (line-level)69.84CodeGPT-adapted

Related Papers

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning2025-07-18Towards Formal Verification of LLM-Generated Code from Natural Language Prompts2025-07-17MERA Code: A Unified Framework for Evaluating Code Generation Across Tasks2025-07-16Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training2025-07-16Function-to-Style Guidance of LLMs for Code Translation2025-07-15The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs2025-07-15CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks2025-07-14Kodezi Chronos: A Debugging-First Language Model for Repository-Scale, Memory-Driven Code Understanding2025-07-14