TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/UCPhrase: Unsupervised Context-aware Quality Phrase Tagging

UCPhrase: Unsupervised Context-aware Quality Phrase Tagging

Xiaotao Gu, Zihan Wang, Zhenyu Bi, Yu Meng, Liyuan Liu, Jiawei Han, Jingbo Shang

2021-05-28Keyphrase ExtractionPhrase RankingPhrase TaggingLanguage Modelling
PaperPDFCode(official)Code(official)

Abstract

Identifying and understanding quality phrases from context is a fundamental task in text mining. The most challenging part of this task arguably lies in uncommon, emerging, and domain-specific phrases. The infrequent nature of these phrases significantly hurts the performance of phrase mining methods that rely on sufficient phrase occurrences in the input corpus. Context-aware tagging models, though not restricted by frequency, heavily rely on domain experts for either massive sentence-level gold labels or handcrafted gazetteers. In this work, we propose UCPhrase, a novel unsupervised context-aware quality phrase tagger. Specifically, we induce high-quality phrase spans as silver labels from consistently co-occurring word sequences within each document. Compared with typical context-agnostic distant supervision based on existing knowledge bases (KBs), our silver labels root deeply in the input domain and context, thus having unique advantages in preserving contextual completeness and capturing emerging, out-of-KB phrases. Training a conventional neural tagger based on silver labels usually faces the risk of overfitting phrase surface names. Alternatively, we observe that the contextualized attention maps generated from a transformer-based neural language model effectively reveal the connections between words in a surface-agnostic way. Therefore, we pair such attention maps with the silver labels to train a lightweight span prediction model, which can be applied to new input to recognize (unseen) quality phrases regardless of their surface names or frequency. Thorough experiments on various tasks and datasets, including corpus-level phrase ranking, document-level keyphrase extraction, and sentence-level phrase tagging, demonstrate the superiority of our design over state-of-the-art pre-trained, unsupervised, and distantly supervised methods.

Results

TaskDatasetMetricValueModel
Phrase RankingKP20kP@50K98.5Wiki+RoBERTa
Phrase RankingKP20kP@5K100Wiki+RoBERTa
Phrase RankingKP20kP@50K96.5UCPhrase
Phrase RankingKP20kP@5K96.5UCPhrase
Phrase RankingKP20kP@50K78TopMine
Phrase RankingKP20kP@5K81.5TopMine
Phrase RankingKPTimesP@50K96.5Wiki+RoBERTa
Phrase RankingKPTimesP@5K99Wiki+RoBERTa
Phrase RankingKPTimesP@50K95.5UCPhrase
Phrase RankingKPTimesP@5K96.5UCPhrase
Phrase RankingKPTimesP@50K95.5AutoPhrase
Phrase RankingKPTimesP@5K96.5AutoPhrase
Phrase RankingKPTimesP@50K71TopMine
Phrase RankingKPTimesP@5K85.5TopMine
Keyphrase ExtractionKP20kF1@1019.2Wiki+RoBERTa
Keyphrase ExtractionKP20kRecall73Wiki+RoBERTa
Keyphrase ExtractionKP20kF1@1019.7UCPhrase
Keyphrase ExtractionKP20kRecall72.9UCPhrase
Keyphrase ExtractionKP20kF1@1018.2AutoPhrase
Keyphrase ExtractionKP20kRecall62.9AutoPhrase
Keyphrase ExtractionKP20kF1@1015.3Spacy
Keyphrase ExtractionKP20kRecall59.5Spacy
Keyphrase ExtractionKP20kF1@1012.6PKE
Keyphrase ExtractionKP20kRecall57.1PKE
Keyphrase ExtractionKP20kF1@1015TopMine
Keyphrase ExtractionKP20kRecall53.3TopMine
Keyphrase ExtractionKP20kF1@1013.9StanfordNLP
Keyphrase ExtractionKP20kRecall51.7StanfordNLP
Keyphrase ExtractionKPTimesF1@1010.9UCPhrase
Keyphrase ExtractionKPTimesRecall83.4UCPhrase
Keyphrase ExtractionKPTimesF1@1010.3AutoPhrase
Keyphrase ExtractionKPTimesRecall77.8AutoPhrase
Keyphrase ExtractionKPTimesF1@109.4Wiki+RoBERTa
Keyphrase ExtractionKPTimesRecall64.5Wiki+RoBERTa
Keyphrase ExtractionKPTimesF1@108.5TopMine
Keyphrase ExtractionKPTimesRecall63.4TopMine
Phrase TaggingKPTimesF173.5UCPhrase
Phrase TaggingKPTimesPrecision69.1UCPhrase
Phrase TaggingKPTimesRecall78.9UCPhrase
Phrase TaggingKPTimesF163.2Wiki+RoBERTa
Phrase TaggingKPTimesPrecision60.9Wiki+RoBERTa
Phrase TaggingKPTimesRecall65.6Wiki+RoBERTa
Phrase TaggingKPTimesF145.9AutoPhrase
Phrase TaggingKPTimesPrecision44.2AutoPhrase
Phrase TaggingKPTimesRecall47.7AutoPhrase
Phrase TaggingKPTimesF134TopMine
Phrase TaggingKPTimesPrecision32TopMine
Phrase TaggingKPTimesRecall36.3TopMine
Phrase TaggingKP20kF173.9UCPhrase
Phrase TaggingKP20kPrecision69.9UCPhrase
Phrase TaggingKP20kRecall78.3UCPhrase
Phrase TaggingKP20kF161Wiki+RoBERTa
Phrase TaggingKP20kPrecision58.1Wiki+RoBERTa
Phrase TaggingKP20kRecall64.2Wiki+RoBERTa
Phrase TaggingKP20kF149.7AutoPhrase
Phrase TaggingKP20kPrecision55.2AutoPhrase
Phrase TaggingKP20kRecall45.2AutoPhrase
Phrase TaggingKP20kF140.6TopMine
Phrase TaggingKP20kPrecision39.8TopMine
Phrase TaggingKP20kRecall41.4TopMine

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17Assay2Mol: large language model-based drug design using BioAssay context2025-07-16Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing2025-07-16