TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/HILL: Hierarchy-aware Information Lossless Contrastive Lea...

HILL: Hierarchy-aware Information Lossless Contrastive Learning for Hierarchical Text Classification

He Zhu, Junran Wu, Ruomei Liu, Yue Hou, Ze Yuan, Shangzhe Li, YiCheng Pan, Ke Xu

2024-03-26Text ClassificationDocument EmbeddingRepresentation Learningtext-classificationContrastive LearningHierarchical Multi-label Classification
PaperPDFCode(official)

Abstract

Existing self-supervised methods in natural language processing (NLP), especially hierarchical text classification (HTC), mainly focus on self-supervised contrastive learning, extremely relying on human-designed augmentation rules to generate contrastive samples, which can potentially corrupt or distort the original information. In this paper, we tend to investigate the feasibility of a contrastive learning scheme in which the semantic and syntactic information inherent in the input sample is adequately reserved in the contrastive samples and fused during the learning process. Specifically, we propose an information lossless contrastive learning strategy for HTC, namely \textbf{H}ierarchy-aware \textbf{I}nformation \textbf{L}ossless contrastive \textbf{L}earning (HILL), which consists of a text encoder representing the input document, and a structure encoder directly generating the positive sample. The structure encoder takes the document embedding as input, extracts the essential syntactic information inherent in the label hierarchy with the principle of structural entropy minimization, and injects the syntactic information into the text representation via hierarchical representation learning. Experiments on three common datasets are conducted to verify the superiority of HILL.

Results

TaskDatasetMetricValueModel
Multi-Label ClassificationRCV1-v2Macro F170.12HILL
Multi-Label ClassificationRCV1-v2Micro F187.31HILL
Multi-Label ClassificationNew York Times Annotated CorpusMacro F169.96HILL
Multi-Label ClassificationNew York Times Annotated CorpusMicro F180.47HILL
Multi-Label ClassificationWOSMacro F181.77HILL
Multi-Label ClassificationWOSMicro F187.28HILL

Related Papers

Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20Making Language Model a Hierarchical Classifier and Generator2025-07-17Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17