TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Learning and Evaluating Contextual Embedding of Source Code

Learning and Evaluating Contextual Embedding of Source Code

Aditya Kanade, Petros Maniatis, Gogul Balakrishnan, Kensen Shi

2019-12-21ICML 2020 1Program RepairNatural Language UnderstandingContextual Embedding for Source Code
PaperPDFCodeCode(official)

Abstract

Recent research has achieved impressive results on understanding and improving source code by building up on machine-learning techniques developed for natural languages. A significant advancement in natural-language understanding has come with the development of pre-trained contextual embeddings, such as BERT, which can be fine-tuned for downstream tasks with less labeled data and training budget, while achieving better accuracies. However, there is no attempt yet to obtain a high-quality contextual embedding of source code, and to evaluate it on multiple program-understanding tasks simultaneously; that is the gap that this paper aims to mitigate. Specifically, first, we curate a massive, deduplicated corpus of 7.4M Python files from GitHub, which we use to pre-train CuBERT, an open-sourced code-understanding BERT model; and, second, we create an open-sourced benchmark that comprises five classification tasks and one program-repair task, akin to code-understanding tasks proposed in the literature before. We fine-tune CuBERT on our benchmark tasks, and compare the resulting models to different variants of Word2Vec token embeddings, BiLSTM and Transformer models, as well as published state-of-the-art models, showing that CuBERT outperforms them all, even with shorter training, and with fewer labeled examples. Future work on source-code embedding can benefit from reusing our benchmark, and from comparing against CuBERT models as a strong baseline.

Related Papers

Vision Language Action Models in Robotic Manipulation: A Systematic Review2025-07-14CORE: Benchmarking LLMs Code Reasoning Capabilities through Static Analysis Tasks2025-07-03A Survey on Vision-Language-Action Models for Autonomous Driving2025-06-30State and Memory is All You Need for Robust and Reliable AI Agents2025-06-30$T^3$: Multi-level Tree-based Automatic Program Repair with Large Language Models2025-06-26skLEP: A Slovak General Language Understanding Benchmark2025-06-26SV-LLM: An Agentic Approach for SoC Security Verification using Large Language Models2025-06-25Understanding Software Engineering Agents: A Study of Thought-Action-Result Trajectories2025-06-23