TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/CodeT5

CodeT5

Natural Language ProcessingIntroduced 200032 papers
Source Paper

Description

CodeT5 is a Transformer-based model for code understanding and generation based on the T5 architecture. It utilizes an identifier-aware pre-training objective that considers the crucial token type information (identifiers) from code. Specifically, the denoising Seq2Seq objective of T5 is extended with two identifier tagging and prediction tasks to enable the model to better leverage the token type information from programming languages, which are the identifiers assigned by developers. To improve the natural language-programming language alignment, a bimodal dual learning objective is used for a bidirectional conversion between natural language and programming language.

Papers Using This Method

I Know Which LLM Wrote Your Code Last Summer: LLM generated Code Stylometry for Authorship Attribution2025-06-18A Multi-Dataset Evaluation of Models for Automated Vulnerability Repair2025-06-05ShIOEnv: A CLI Behavior-Capturing Environment Enabling Grammar-Guided Command Synthesis for Dataset Curation2025-05-23LogiCase: Effective Test Case Generation from Logical Description in Competitive Programming2025-05-21Large Language Models are Qualified Benchmark Builders: Rebuilding Pre-Training Datasets for Advancing Code Intelligence Tasks2025-04-28Enhancing Code LLM Training with Programmer Attention2025-03-19Robust and Secure Code Watermarking for Large Language Models via ML/Crypto Codesign2025-02-04How to Select Pre-Trained Code Models for Reuse? A Learning Perspective2025-01-07Can LLMs Obfuscate Code? A Systematic Analysis of Large Language Models into Assembly Code Obfuscation2024-12-20Generative Fuzzy System for Sequence Generation2024-11-21Building A Coding Assistant via the Retrieval-Augmented Language Model2024-10-21Detection Made Easy: Potentials of Large Language Models for Solidity Vulnerabilities2024-09-15VulCatch: Enhancing Binary Vulnerability Detection through CodeT5 Decompilation and KAN Advanced Feature Extraction2024-08-13Empirical Studies of Parameter Efficient Methods for Large Language Models of Code and Knowledge Transfer to R2024-03-16AST-T5: Structure-Aware Pretraining for Code Generation and Understanding2024-01-05PerfRL: A Small Language Model Framework for Efficient Code Optimization2023-12-09Converting Epics/Stories into Pseudocode using Transformers2023-12-08Learning Defect Prediction from Unrealistic Data2023-11-02Data Augmentation for Code Translation with Comparable Corpora and Multiple References2023-11-01Program Repair with Minimal Edits Using CodeT52023-09-26