TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/GLM: General Language Model Pretraining with Autoregressiv...

GLM: General Language Model Pretraining with Autoregressive Blank Infilling

Zhengxiao Du, Yujie Qian, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin Yang, Jie Tang

2021-03-18ACL 2022 5Abstractive Text SummarizationNatural Language UnderstandingDocument SummarizationGeneral ClassificationClassificationLanguage Modelling
PaperPDFCodeCodeCodeCodeCode(official)CodeCodeCode

Abstract

There have been various types of pretraining architectures including autoencoding models (e.g., BERT), autoregressive models (e.g., GPT), and encoder-decoder models (e.g., T5). However, none of the pretraining frameworks performs the best for all tasks of three main categories including natural language understanding (NLU), unconditional generation, and conditional generation. We propose a General Language Model (GLM) based on autoregressive blank infilling to address this challenge. GLM improves blank filling pretraining by adding 2D positional encodings and allowing an arbitrary order to predict spans, which results in performance gains over BERT and T5 on NLU tasks. Meanwhile, GLM can be pretrained for different types of tasks by varying the number and lengths of blanks. On a wide range of tasks across NLU, conditional and unconditional generation, GLM outperforms BERT, T5, and GPT given the same model sizes and data, and achieves the best performance from a single pretrained model with 1.25x parameters of BERT Large , demonstrating its generalizability to different downstream tasks.

Results

TaskDatasetMetricValueModel
Language ModellingWikiText-103Test perplexity11.33GLM-XXLarge (bidirectional)
Language ModellingWikiText-103Test perplexity12.22GLM-XXLarge (unidirectional)
Language ModellingLAMBADAAccuracy72.35GLM-XXLarge (bidirectional)
Language ModellingLAMBADAAccuracy67.18GLM-XXLarge (unidirectional)
Text SummarizationCNN / Daily MailROUGE-144.7GLM-XXLarge
Text SummarizationCNN / Daily MailROUGE-221.4GLM-XXLarge
Text SummarizationCNN / Daily MailROUGE-L41.4GLM-XXLarge
Text SummarizationCNN / Daily MailROUGE-144.7GLM-XXLarge
Text SummarizationCNN / Daily MailROUGE-221.4GLM-XXLarge
Text SummarizationCNN / Daily MailROUGE-L41.4GLM-XXLarge
Abstractive Text SummarizationCNN / Daily MailROUGE-144.7GLM-XXLarge
Abstractive Text SummarizationCNN / Daily MailROUGE-221.4GLM-XXLarge
Abstractive Text SummarizationCNN / Daily MailROUGE-L41.4GLM-XXLarge
Document SummarizationCNN / Daily MailROUGE-144.7GLM-XXLarge
Document SummarizationCNN / Daily MailROUGE-221.4GLM-XXLarge
Document SummarizationCNN / Daily MailROUGE-L41.4GLM-XXLarge

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16Safeguarding Federated Learning-based Road Condition Classification2025-07-16