Modeling Code: Is Text All You Need?

Daniel Nichols, Konstantinos Parasyris, Harshitha Menon, Brian R. Bartoldson, Giorgis Georgakoudis, Tal Ben-Nun, Abhinav Bhatele

2025-07-15All

Paper PDF

Abstract

Code LLMs have become extremely popular recently for modeling source code across a variety of tasks, such as generation, translation, and summarization. However, transformer-based models are limited in their capabilities to reason through structured, analytical properties of code, such as control and data flow. Previous work has explored the modeling of these properties with structured data and graph neural networks. However, these approaches lack the generative capabilities and scale of modern LLMs. In this work, we introduce a novel approach to combine the strengths of modeling both code as text and more structured forms.

Related Papers

All Eyes, no IMU: Learning Flight Attitude from Vision Alone2025-07-15 Is Diversity All You Need for Scalable Robotic Manipulation?2025-07-08 DESIGN AND IMPLEMENTATION OF ONLINE CLEARANCE REPORT.2025-07-07 Is Reasoning All You Need? Probing Bias in the Age of Reasoning Language Models2025-07-03 Prompt2SegCXR:Prompt to Segment All Organs and Diseases in Chest X-rays2025-07-01 State and Memory is All You Need for Robust and Reliable AI Agents2025-06-30 EAMamba: Efficient All-Around Vision State Space Model for Image Restoration2025-06-27 FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language2025-06-26