TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Toward Efficient Language Model Pretraining and Downstream...

Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE

Qihuang Zhong, Liang Ding, Yibing Zhan, Yu Qiao, Yonggang Wen, Li Shen, Juhua Liu, Baosheng Yu, Bo Du, Yixin Chen, Xinbo Gao, Chunyan Miao, Xiaoou Tang, DaCheng Tao

2022-12-04Question Answeringcoreference-resolutionCoreference ResolutionNatural Language InferenceCommon Sense ReasoningMasked Language ModelingWord Sense DisambiguationLanguage Modelling
PaperPDF

Abstract

This technical report briefly describes our JDExplore d-team's Vega v2 submission on the SuperGLUE leaderboard. SuperGLUE is more challenging than the widely used general language understanding evaluation (GLUE) benchmark, containing eight difficult language understanding tasks, including question answering, natural language inference, word sense disambiguation, coreference resolution, and reasoning. [Method] Instead of arbitrarily increasing the size of a pretrained language model (PLM), our aim is to 1) fully extract knowledge from the input pretraining data given a certain parameter budget, e.g., 6B, and 2) effectively transfer this knowledge to downstream tasks. To achieve goal 1), we propose self-evolution learning for PLMs to wisely predict the informative tokens that should be masked, and supervise the masked language modeling (MLM) process with rectified smooth labels. For goal 2), we leverage the prompt transfer technique to improve the low-resource tasks by transferring the knowledge from the foundation model and related downstream tasks to the target task. [Results] According to our submission record (Oct. 2022), with our optimized pretraining and fine-tuning strategies, our 6B Vega method achieved new state-of-the-art performance on 4/8 tasks, sitting atop the SuperGLUE leaderboard on Oct. 8, 2022, with an average score of 91.3.

Results

TaskDatasetMetricValueModel
Question AnsweringCOPAAccuracy99.4Vega v2 6B (KD-based prompt transfer)
Question AnsweringCOPAAccuracy98.2Turing NLR v5 XXL 5.4B (fine-tuned)
Question AnsweringMultiRCEM63Turing NLR v5 XXL 5.4B (fine-tuned)
Question AnsweringMultiRCF188.4Turing NLR v5 XXL 5.4B (fine-tuned)
Question AnsweringMultiRCEM62.4Vega v2 6B (fine-tuned)
Question AnsweringMultiRCF188.2Vega v2 6B (fine-tuned)
Question AnsweringBoolQAccuracy92Turing NLR v5 XXL 5.4B (fine-tuned)
Question AnsweringBoolQAccuracy90.5Vega v2 6B (fine-tuned)
Common Sense ReasoningReCoRDEM95.9Turing NLR v5 XXL 5.4B (fine-tuned)
Common Sense ReasoningReCoRDF196.4Turing NLR v5 XXL 5.4B (fine-tuned)
Common Sense ReasoningReCoRDEM93.9Vega v2 6B (fine-tuned)
Common Sense ReasoningReCoRDF194.4Vega v2 6B (fine-tuned)
Word Sense DisambiguationWords in ContextAccuracy77.4Vega v2 6B (fine-tuned)
Word Sense DisambiguationWords in ContextAccuracy77.1Turing NLR v5 XXL 5.4B (fine-tuned)
Natural Language InferenceWNLIAccuracy95.9Turing NLR v5 XXL 5.4B (fine-tuned)
Natural Language InferenceCommitmentBankAccuracy99.2Vega v2 6B (KD-based prompt transfer)
Natural Language InferenceCommitmentBankF198.6Vega v2 6B (KD-based prompt transfer)
Natural Language InferenceCommitmentBankAccuracy97.6Turing NLR v5 XXL 5.4B (fine-tuned)
Natural Language InferenceCommitmentBankF195.9Turing NLR v5 XXL 5.4B (fine-tuned)
Coreference ResolutionWinograd Schema ChallengeAccuracy98.6Vega v2 6B (KD-based prompt transfer)
Coreference ResolutionWinograd Schema ChallengeAccuracy97.3Turing NLR v5 XXL 5.4B (fine-tuned)

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17