TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/English Please: Evaluating Machine Translation with Large ...

English Please: Evaluating Machine Translation with Large Language Models for Multilingual Bug Reports

Avinash Patil, Siru Tao, Aryan Jadon

2025-02-20Machine TranslationLanguage IdentificationTAGTranslationDomain Adaptation
PaperPDFCode(official)

Abstract

Accurate translation of bug reports is critical for efficient collaboration in global software development. In this study, we conduct the first comprehensive evaluation of machine translation (MT) performance on bug reports, analyzing the capabilities of DeepL, AWS Translate, and large language models such as ChatGPT, Claude, Gemini, LLaMA, and Mistral using data from the Visual Studio Code GitHub repository, specifically focusing on reports labeled with the english-please tag. To assess both translation quality and source language identification accuracy, we employ a range of MT evaluation metrics-including BLEU, BERTScore, COMET, METEOR, and ROUGE-alongside classification metrics such as accuracy, precision, recall, and F1-score. Our findings reveal that while ChatGPT (gpt-4o) excels in semantic and lexical translation quality, it does not lead in source language identification. Claude and Mistral achieve the highest F1-scores (0.7182 and 0.7142, respectively), and Gemini records the best precision (0.7414). AWS Translate shows the highest accuracy (0.4717) in identifying source languages. These results highlight that no single system dominates across all tasks, reinforcing the importance of task-specific evaluations. This study underscores the need for domain adaptation when translating technical content and provides actionable insights for integrating MT into bug-triaging workflows. The code and dataset for this paper are available at GitHub-https://github.com/av9ash/English-Please

Results

TaskDatasetMetricValueModel
Machine TranslationMulti Lingual Bug ReportsBERTScore79ChatGPT

Related Papers

A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17Function-to-Style Guidance of LLMs for Code Translation2025-07-15Domain Borders Are There to Be Crossed With Federated Few-Shot Adaptation2025-07-14An Offline Mobile Conversational Agent for Mental Health Support: Learning from Emotional Dialogues and Psychological Texts with Student-Centered Evaluation2025-07-11The Bayesian Approach to Continual Learning: An Overview2025-07-11Doodle Your Keypoints: Sketch-Based Few-Shot Keypoint Detection2025-07-10Speak2Sign3D: A Multi-modal Pipeline for English Speech to American Sign Language Animation2025-07-09