TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/ACES: Translation Accuracy Challenge Sets for Evaluating M...

ACES: Translation Accuracy Challenge Sets for Evaluating Machine Translation Metrics

Chantal Amrhein, Nikita Moghe, Liane Guillou

2022-10-27Machine TranslationTranslationWorld Knowledge
PaperPDFCode(official)

Abstract

As machine translation (MT) metrics improve their correlation with human judgement every year, it is crucial to understand the limitations of such metrics at the segment level. Specifically, it is important to investigate metric behaviour when facing accuracy errors in MT because these can have dangerous consequences in certain contexts (e.g., legal, medical). We curate ACES, a translation accuracy challenge set, consisting of 68 phenomena ranging from simple perturbations at the word/character level to more complex errors based on discourse and real-world knowledge. We use ACES to evaluate a wide range of MT metrics including the submissions to the WMT 2022 metrics shared task and perform several analyses leading to general recommendations for metric developers. We recommend: a) combining metrics with different strengths, b) developing metrics that give more weight to the source and less to surface-level overlap with the reference and c) explicitly modelling additional language-specific information beyond what is available via multilingual embeddings.

Results

TaskDatasetMetricValueModel
Machine TranslationACESScore19.97HWTSC-Teacher-Sim
Machine TranslationACESScore19.89MS-COMET-22
Machine TranslationACESScore19.76MS-COMET-QE-22
Machine TranslationACESScore17.28KG-BERTScore
Machine TranslationACESScore17.17metricx_xl_DA_2019
Machine TranslationACESScore16.8COMET-QE
Machine TranslationACESScore16.31COMET-22
Machine TranslationACESScore15.68UniTE-src
Machine TranslationACESScore15.38UniTE-ref
Machine TranslationACESScore15.24metricx_xxl_DA_2019
Machine TranslationACESScore14.76UniTE
Machine TranslationACESScore14.07Cross-QE
Machine TranslationACESScore13.57chrF
Machine TranslationACESScore13.08metricx_xl_MQM_2020
Machine TranslationACESScore12.06COMET-20
Machine TranslationACESScore11.9BLEURT-20
Machine TranslationACESScore11.38YiSi-1
Machine TranslationACESScore10.47BERTScore
Machine TranslationACESScore-3.13BLEU
Machine TranslationACESScore-0.33f101spBLEU
Machine TranslationACESScore-0.18f200spBLEU

Related Papers

A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17HRSeg: High-Resolution Visual Perception and Enhancement for Reasoning Segmentation2025-07-17Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes2025-07-17Function-to-Style Guidance of LLMs for Code Translation2025-07-15KEN: Knowledge Augmentation and Emotion Guidance Network for Multimodal Fake News Detection2025-07-13Speak2Sign3D: A Multi-modal Pipeline for English Speech to American Sign Language Animation2025-07-09Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings2025-07-09Unconditional Diffusion for Generative Sequential Recommendation2025-07-08