How Does Naming Affect LLMs on Code Analysis Tasks?

Zhilong Wang, Lan Zhang, Chen Cao, Nanqing Luo, Xinzhi Luo, Peng Liu

2023-07-24Representation Learning Large Language Model Code Generation Language Modelling

Abstract

The Large Language Models (LLMs), such as GPT and BERT, were proposed for natural language processing (NLP) and have shown promising results as general-purpose language models. An increasing number of industry professionals and researchers are adopting LLMs for program analysis tasks. However, one significant difference between programming languages and natural languages is that a programmer has the flexibility to assign any names to variables, methods, and functions in the program, whereas a natural language writer does not. Intuitively, the quality of naming in a program affects the performance of LLMs in program analysis tasks. This paper investigates how naming affects LLMs on code analysis tasks. Specifically, we create a set of datasets with code containing nonsense or misleading names for variables, methods, and functions, respectively. We then use well-trained models (CodeBERT) to perform code analysis tasks on these datasets. The experimental results show that naming has a significant impact on the performance of code analysis tasks based on LLMs, indicating that code representation learning based on LLMs heavily relies on well-defined names in code. Additionally, we conduct a case study on some special code analysis tasks using GPT, providing further insights.

Results

Task	Dataset	Metric	Value	Model
Code Generation	MBPP	Accuracy	87.5	GPT-4 (ChatGPT Plus)
Code Generation	MBPP	Accuracy	83.2	GPT-3.5 Turbo (ChatGPT)
Code Generation	MBPP	Accuracy	82	GPT-4 (Bing Chat)
Code Generation	MBPP	Accuracy	76.2	Bard (PaLM 2/chat-bison-001)
Code Generation	MBPP	Accuracy	71.4	Claude

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21 Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20 DENSE: Longitudinal Progress Note Generation with Temporal Modeling of Heterogeneous Clinical Notes Across Hospital Visits2025-07-18 CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning2025-07-18 Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17 Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17 GeoReg: Weight-Constrained Few-Shot Regression for Socio-Economic Estimation using LLM2025-07-17 The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17