TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/A Statutory Article Retrieval Dataset in French

A Statutory Article Retrieval Dataset in French

Antoine Louis, Gerasimos Spanakis

2021-08-26ACL 2022 5Information RetrievalSpecificityRetrieval
PaperPDFCode(official)

Abstract

Statutory article retrieval is the task of automatically retrieving law articles relevant to a legal question. While recent advances in natural language processing have sparked considerable interest in many legal tasks, statutory article retrieval remains primarily untouched due to the scarcity of large-scale and high-quality annotated datasets. To address this bottleneck, we introduce the Belgian Statutory Article Retrieval Dataset (BSARD), which consists of 1,100+ French native legal questions labeled by experienced jurists with relevant articles from a corpus of 22,600+ Belgian law articles. Using BSARD, we benchmark several state-of-the-art retrieval approaches, including lexical and dense architectures, both in zero-shot and supervised setups. We find that fine-tuned dense retrieval models significantly outperform other systems. Our best performing baseline achieves 74.8% R@100, which is promising for the feasibility of the task and indicates there is still room for improvement. By the specificity of the domain and addressed task, BSARD presents a unique challenge problem for future research on legal information retrieval. Our dataset and source code are publicly available.

Results

TaskDatasetMetricValueModel
Information RetrievalBSARDRecall@10074.78Two-tower Bi-Encoder (RoBERTa)
Information RetrievalBSARDRecall@20078.04Two-tower Bi-Encoder (RoBERTa)
Information RetrievalBSARDRecall@50083.39Two-tower Bi-Encoder (RoBERTa)
Information RetrievalBSARDRecall@10071.63Siamese Bi-Encoder (RoBERTa)
Information RetrievalBSARDRecall@20078.38Siamese Bi-Encoder (RoBERTa)
Information RetrievalBSARDRecall@50083.77Siamese Bi-Encoder (RoBERTa)
Information RetrievalBSARDRecall@10051.33BM25
Information RetrievalBSARDRecall@20056.78BM25
Information RetrievalBSARDRecall@50064.71BM25

Related Papers

Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17A Survey of Context Engineering for Large Language Models2025-07-17MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval2025-07-17Developing Visual Augmented Q&A System using Scalable Vision Embedding Retrieval & Late Interaction Re-ranker2025-07-16Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos2025-07-16Context-Aware Search and Retrieval Over Erasure Channels2025-07-16