Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/CodeT5

CodeT5

Natural Language ProcessingIntroduced 200032 papers

Description

CodeT5 is a Transformer-based model for code understanding and generation based on the T5 architecture. It utilizes an identifier-aware pre-training objective that considers the crucial token type information (identifiers) from code. Specifically, the denoising Seq2Seq objective of T5 is extended with two identifier tagging and prediction tasks to enable the model to better leverage the token type information from programming languages, which are the identifiers assigned by developers. To improve the natural language-programming language alignment, a bimodal dual learning objective is used for a bidirectional conversion between natural language and programming language.

Papers Using This Method

I Know Which LLM Wrote Your Code Last Summer: LLM generated Code Stylometry for Authorship Attribution2025-06-18 A Multi-Dataset Evaluation of Models for Automated Vulnerability Repair2025-06-05 ShIOEnv: A CLI Behavior-Capturing Environment Enabling Grammar-Guided Command Synthesis for Dataset Curation2025-05-23 LogiCase: Effective Test Case Generation from Logical Description in Competitive Programming2025-05-21 Large Language Models are Qualified Benchmark Builders: Rebuilding Pre-Training Datasets for Advancing Code Intelligence Tasks2025-04-28 Enhancing Code LLM Training with Programmer Attention2025-03-19 Robust and Secure Code Watermarking for Large Language Models via ML/Crypto Codesign2025-02-04 How to Select Pre-Trained Code Models for Reuse? A Learning Perspective2025-01-07 Can LLMs Obfuscate Code? A Systematic Analysis of Large Language Models into Assembly Code Obfuscation2024-12-20 Generative Fuzzy System for Sequence Generation2024-11-21 Building A Coding Assistant via the Retrieval-Augmented Language Model2024-10-21 Detection Made Easy: Potentials of Large Language Models for Solidity Vulnerabilities2024-09-15 VulCatch: Enhancing Binary Vulnerability Detection through CodeT5 Decompilation and KAN Advanced Feature Extraction2024-08-13 Empirical Studies of Parameter Efficient Methods for Large Language Models of Code and Knowledge Transfer to R2024-03-16 AST-T5: Structure-Aware Pretraining for Code Generation and Understanding2024-01-05 PerfRL: A Small Language Model Framework for Efficient Code Optimization2023-12-09 Converting Epics/Stories into Pseudocode using Transformers2023-12-08 Learning Defect Prediction from Unrealistic Data2023-11-02 Data Augmentation for Code Translation with Comparable Corpora and Multiple References2023-11-01 Program Repair with Minimal Edits Using CodeT52023-09-26