TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/FQuAD: French Question Answering Dataset

FQuAD: French Question Answering Dataset

Martin d'Hoffschmidt, Wacim Belblidia, Tom Brendlé, Quentin Heinrich, Maxime Vidal

2020-02-14Findings of the Association for Computational Linguistics 2020Reading ComprehensionQuestion AnsweringMachine Reading ComprehensionLanguage Modelling
PaperPDF

Abstract

Recent advances in the field of language modeling have improved state-of-the-art results on many Natural Language Processing tasks. Among them, Reading Comprehension has made significant progress over the past few years. However, most results are reported in English since labeled resources available in other languages, such as French, remain scarce. In the present work, we introduce the French Question Answering Dataset (FQuAD). FQuAD is a French Native Reading Comprehension dataset of questions and answers on a set of Wikipedia articles that consists of 25,000+ samples for the 1.0 version and 60,000+ samples for the 1.1 version. We train a baseline model which achieves an F1 score of 92.2 and an exact match ratio of 82.1 on the test set. In order to track the progress of French Question Answering models we propose a leader-board and we have made the 1.0 version of our dataset freely available at https://illuin-tech.github.io/FQuAD-explorer/.

Results

TaskDatasetMetricValueModel
Question AnsweringFQuADEM82.1CamemBERT-Large
Question AnsweringFQuADF192.2CamemBERT-Large
Question AnsweringFQuADEM79XLM-RoBERTa-Large
Question AnsweringFQuADF189.5XLM-RoBERTa-Large
Question AnsweringFQuADEM78.4CamemBERT-Base
Question AnsweringFQuADF188.4CamemBERT-Base
Question AnsweringFQuADEM75.3XLM-RoBERTa-Base
Question AnsweringFQuADF185.9XLM-RoBERTa-Base

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17