FairTranslate_fr

TextsmitIntroduced 2025-04-22

The FairTranslate Dataset includes 2,418 sentence pairs, each centered around an occupation, designed to assess gender expression and translation in English-French contexts. Each English sentence appears in three gender variants (male, female, inclusive), allowing for direct counterfactual comparisons. This structure supports fairness evaluations and helps analyze how models handle grammatical gender, inclusive forms, and coreference resolution in translation.

Each example in the dataset is annotated with rich metadata:

  • English: Sentence involving an occupation, designed to test both explicit and subtle cues of gender.
  • French: Ground-truth translation faithfully aligned with the intended gender variant.
  • Gender: Target gender for translation: male, female, or inclusive.
  • Ambiguity: Level of gender ambiguity in the English source sentence:
    • ambiguous: No explicit gender cues.
    • unambiguous: Clear pronouns or cues (e.g., “he”, “she”, “they”).
    • long unambiguous: Gender resolvable from distant context, testing long-range coreference.
  • Stereotype: Whether the occupation is male-stereotyped, female-stereotyped, or gender-balanced, based on real-world statistics from Statbel.
  • Occupation: A list of the three gendered French forms for each occupation, e.g., ["infirmier", "infirmière", "infirmier.ière"].