TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Contract Discovery: Dataset and a Few-Shot Semantic Retrie...

Contract Discovery: Dataset and a Few-Shot Semantic Retrieval Challenge with Competitive Baselines

Łukasz Borchmann, Dawid Wiśniewski, Andrzej Gretkowski, Izabela Kosmala, Dawid Jurkiewicz, Łukasz Szałkiewicz, Gabriela Pałka, Karol Kaczmarek, Agnieszka Kaliska, Filip Graliński

2019-11-10Findings of the Association for Computational Linguistics 2020Few-Shot LearningSemantic SimilaritySemantic RetrievalRetrievalLanguage Modelling
PaperPDFCode(official)

Abstract

We propose a new shared task of semantic retrieval from legal texts, in which a so-called contract discovery is to be performed, where legal clauses are extracted from documents, given a few examples of similar clauses from other legal acts. The task differs substantially from conventional NLI and shared tasks on legal information extraction (e.g., one has to identify text span instead of a single document, page, or paragraph). The specification of the proposed task is followed by an evaluation of multiple solutions within the unified framework proposed for this branch of methods. It is shown that state-of-the-art pretrained encoders fail to provide satisfactory results on the task proposed. In contrast, Language Model-based solutions perform better, especially when unsupervised fine-tuning is applied. Besides the ablation studies, we addressed questions regarding detection accuracy for relevant text fragments depending on the number of examples available. In addition to the dataset and reference results, LMs specialized in the legal domain were made publicly available.

Results

TaskDatasetMetricValueModel
Semantic RetrievalContract DiscoverySoft-F10.84Human baseline
Semantic RetrievalContract DiscoverySoft-F10.51k-NN with sentence n-grams, GPT-2 embeddings, fICA
Semantic RetrievalContract DiscoverySoft-F10.39LSA baseline
Semantic RetrievalContract DiscoverySoft-F10.38Universal Sentence Encoder
Semantic RetrievalContract DiscoverySoft-F10.31Sentence BERT

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21GLAD: Generalizable Tuning for Vision-Language Models2025-07-17SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17A Survey of Context Engineering for Large Language Models2025-07-17MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17