Gendered Pronoun Resolution using BERT and an extractive question answering formulation

Rakesh Chada

2019-06-09WS 2019 8Question Answering coreference-resolution Coreference Resolution Natural Language Understanding Extractive Question-Answering Multiple-choice

Paper PDF Code(official)

Abstract

The resolution of ambiguous pronouns is a longstanding challenge in Natural Language Understanding. Recent studies have suggested gender bias among state-of-the-art coreference resolution systems. As an example, Google AI Language team recently released a gender-balanced dataset and showed that performance of these coreference resolvers is significantly limited on the dataset. In this paper, we propose an extractive question answering (QA) formulation of pronoun resolution task that overcomes this limitation and shows much lower gender bias (0.99) on their dataset. This system uses fine-tuned representations from the pre-trained BERT model and outperforms the existing baseline by a significant margin (22.2% absolute improvement in F1 score) without using any hand-engineered features. This QA framework is equally performant even without the knowledge of the candidate antecedents of the pronoun. An ensemble of QA and BERT-based multiple choice and sequence classification models further improves the F1 (23.3% absolute improvement upon the baseline). This ensemble model was submitted to the shared task for the 1st ACL workshop on Gender Bias for Natural Language Processing. It ranked 9th on the final official leaderboard. Source code is available at https://github.com/rakeshchada/corefqa

Results

Task	Dataset	Metric	Value	Model
Coreference Resolution	GAP	Bias (F/M)	0.98	Full Ensemble
Coreference Resolution	GAP	Feminine F1 (F)	89.5	Full Ensemble
Coreference Resolution	GAP	Masculine F1 (M)	90.9	Full Ensemble
Coreference Resolution	GAP	Overall F1	90.2	Full Ensemble

Related Papers

From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17 Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17 Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17 City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17 The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17 HATS: Hindi Analogy Test Set for Evaluating Reasoning in Large Language Models2025-07-17 Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16 Is This Just Fantasy? Language Model Representations Reflect Human Judgments of Event Plausibility2025-07-16