TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/CodeTrans: Towards Cracking the Language of Silicon's Code...

CodeTrans: Towards Cracking the Language of Silicon's Code Through Self-Supervised Deep Learning and High Performance Computing

Ahmed Elnaggar, Wei Ding, Llion Jones, Tom Gibbs, Tamas Feher, Christoph Angerer, Silvia Severini, Florian Matthes, Burkhard Rost

2021-04-06Code Documentation GenerationProgram SynthesisTransfer LearningMulti-Task LearningCode Comment GenerationCode GenerationSource Code SummarizationContextual Embedding for Source Code
PaperPDFCode(official)

Abstract

Currently, a growing number of mature natural language processing applications make people's life more convenient. Such applications are built by source code - the language in software engineering. However, the applications for understanding source code language to ease the software engineering process are under-researched. Simultaneously, the transformer model, especially its combination with transfer learning, has been proven to be a powerful technique for natural language processing tasks. These breakthroughs point out a promising direction for process source code and crack software engineering tasks. This paper describes CodeTrans - an encoder-decoder transformer model for tasks in the software engineering domain, that explores the effectiveness of encoder-decoder transformer models for six software engineering tasks, including thirteen sub-tasks. Moreover, we have investigated the effect of different training strategies, including single-task learning, transfer learning, multi-task learning, and multi-task learning with fine-tuning. CodeTrans outperforms the state-of-the-art models on all the tasks. To expedite future works in the software engineering domain, we have published our pre-trained models of CodeTrans. https://github.com/agemagician/CodeTrans

Results

TaskDatasetMetricValueModel
Text GenerationCodeSearchNet - PythonSmoothed BLEU-420.39CodeTrans-MT-Base
Text GenerationCodeSearchNet - GoSmoothed BLEU-419.54CodeTrans-TF-Large
Text GenerationCodeSearchNet - JavaScriptSmoothed BLEU-418.98CodeTrans-TF-Large
Text GenerationCodeSearchNet - PhpSmoothed BLEU-426.23CodeTrans-MT-Base
Text GenerationCodeSearchNet - JavaSmoothed BLEU-421.87CodeTrans-MT-Large
Text GenerationCodeSearchNet - RubySmoothed BLEU-415.26CodeTrans-MT-Base
Code GenerationCodeSearchNet - PythonSmoothed BLEU-420.39CodeTrans-MT-Base
Code GenerationCodeSearchNet - GoSmoothed BLEU-419.54CodeTrans-TF-Large
Code GenerationCodeSearchNet - JavaScriptSmoothed BLEU-418.98CodeTrans-TF-Large
Code GenerationCodeSearchNet - PhpSmoothed BLEU-426.23CodeTrans-MT-Base
Code GenerationCodeSearchNet - JavaSmoothed BLEU-421.87CodeTrans-MT-Large
Code GenerationCodeSearchNet - RubySmoothed BLEU-415.26CodeTrans-MT-Base
Program SynthesisAlgoLispAccuracy90.31CodeTrans-MT-TF-Small
Source Code SummarizationSummarizing Source Code using a Neural Attention Model - C#Smoothed BLEU-423.57CodeTrans-MT-Large
Source Code SummarizationSummarizing Source Code using a Neural Attention Model - PythonSmoothed BLEU-413.37CodeTrans-MT-Base
Source Code SummarizationSummarizing Source Code using a Neural Attention Model - SQLSmoothed BLEU-419.98CodeTrans-MT-TF-Large
Git Commit Message GenerationCommitGenBLEU-444.41CodeTrans-TF-Large
API Sequence RecommendationDeepAPIBLEU-473.39CodeTrans-MT-TF-Large
Code Documentation GenerationCodeSearchNet - PythonSmoothed BLEU-420.39CodeTrans-MT-Base
Code Documentation GenerationCodeSearchNet - GoSmoothed BLEU-419.54CodeTrans-TF-Large
Code Documentation GenerationCodeSearchNet - JavaScriptSmoothed BLEU-418.98CodeTrans-TF-Large
Code Documentation GenerationCodeSearchNet - PhpSmoothed BLEU-426.23CodeTrans-MT-Base
Code Documentation GenerationCodeSearchNet - JavaSmoothed BLEU-421.87CodeTrans-MT-Large
Code Documentation GenerationCodeSearchNet - RubySmoothed BLEU-415.26CodeTrans-MT-Base
Code Comment GenerationDeepComSmoothed BLEU-439.5CodeTrans-TF-Large

Related Papers

RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction2025-07-18CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning2025-07-18Disentangling coincident cell events using deep transfer learning and compressive sensing2025-07-17SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17Towards Formal Verification of LLM-Generated Code from Natural Language Prompts2025-07-17Best Practices for Large-Scale, Pixel-Wise Crop Mapping and Transfer Learning Workflows2025-07-16MERA Code: A Unified Framework for Evaluating Code Generation Across Tasks2025-07-16Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training2025-07-16