TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Continual Pre-training of Language Models

Continual Pre-training of Language Models

Zixuan Ke, Yijia Shao, Haowei Lin, Tatsuya Konishi, Gyuhak Kim, Bing Liu

2023-02-07Continual LearningTransfer LearningContinual PretrainingGeneral Knowledge
PaperPDFCode(official)Code

Abstract

Language models (LMs) have been instrumental for the rapid advance of natural language processing. This paper studies continual pre-training of LMs, in particular, continual domain-adaptive pre-training (or continual DAP-training). Existing research has shown that further pre-training an LM using a domain corpus to adapt the LM to the domain can improve the end-task performance in the domain. This paper proposes a novel method to continually DAP-train an LM with a sequence of unlabeled domain corpora to adapt the LM to these domains to improve their end-task performances. The key novelty of our method is a soft-masking mechanism that directly controls the update to the LM. A novel proxy is also proposed to preserve the general knowledge in the original LM. Additionally, it contrasts the representations of the previously learned domain knowledge (including the general knowledge in the pre-trained LM) and the knowledge from the current full network to achieve knowledge integration. The method not only overcomes catastrophic forgetting, but also achieves knowledge transfer to improve end-task performances. Empirical evaluation demonstrates the effectiveness of the proposed method.

Results

TaskDatasetMetricValueModel
Continual PretrainingACL-ARCF1 (macro)0.6936DAS
Continual PretrainingSciERCF1 (macro)0.7093DAS

Related Papers

RaMen: Multi-Strategy Multi-Modal Learning for Bundle Construction2025-07-18Disentangling coincident cell events using deep transfer learning and compressive sensing2025-07-17RegCL: Continual Adaptation of Segment Anything Model via Model Merging2025-07-16Information-Theoretic Generalization Bounds of Replay-based Continual Learning2025-07-16PROL : Rehearsal Free Continual Learning in Streaming Data via Prompt Online Learning2025-07-16Best Practices for Large-Scale, Pixel-Wise Crop Mapping and Transfer Learning Workflows2025-07-16Fast Last-Iterate Convergence of SGD in the Smooth Interpolation Regime2025-07-15A Neural Network Model of Complementary Learning Systems: Pattern Separation and Completion for Continual Learning2025-07-15