TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/N-Grammer: Augmenting Transformers with latent n-grams

N-Grammer: Augmenting Transformers with latent n-grams

Aurko Roy, Rohan Anil, Guangda Lai, Benjamin Lee, Jeffrey Zhao, Shuyuan Zhang, Shibo Wang, Ye Zhang, Shen Wu, Rigel Swavely, Tao, Yu, Phuong Dao, Christopher Fifty, Zhifeng Chen, Yonghui Wu

2022-07-13Text ClassificationQuestion AnsweringCoreference ResolutionNatural Language InferenceCommon Sense ReasoningWord Sense DisambiguationLanguage Modelling
PaperPDFCodeCode(official)

Abstract

Transformer models have recently emerged as one of the foundational models in natural language processing, and as a byproduct, there is significant recent interest and investment in scaling these models. However, the training and inference costs of these large Transformer language models are prohibitive, thus necessitating more research in identifying more efficient variants. In this work, we propose a simple yet effective modification to the Transformer architecture inspired by the literature in statistical language modeling, by augmenting the model with n-grams that are constructed from a discrete latent representation of the text sequence. We evaluate our model, the N-Grammer on language modeling on the C4 data-set as well as text classification on the SuperGLUE data-set, and find that it outperforms several strong baselines such as the Transformer and the Primer. We open-source our model for reproducibility purposes in Jax.

Results

TaskDatasetMetricValueModel
Question AnsweringCOPAAccuracy60N-Grammer 343M
Question AnsweringMultiRCEM11.3N-Grammer 343M
Question AnsweringMultiRCF162N-Grammer 343M
Question AnsweringBoolQAccuracy65N-Grammer 343M
Common Sense ReasoningReCoRDEM28.9N-Grammer 343M
Common Sense ReasoningReCoRDF129.9N-Grammer 343M
Word Sense DisambiguationWords in ContextAccuracy56.1N-Grammer 343M
Natural Language InferenceCommitmentBankAccuracy67.9N-Grammer 343M
Natural Language InferenceCommitmentBankF159.7N-Grammer 343M
Language ModellingC4Perplexity14.79N-Grammer 343M
Language ModellingC4Perplexity15.01N-Grammer 288M
Coreference ResolutionWinograd Schema ChallengeAccuracy68.3N-Grammer 343M

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21Making Language Model a Hierarchical Classifier and Generator2025-07-17From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17Enter the Mind Palace: Reasoning and Planning for Long-term Active Embodied Question Answering2025-07-17Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It2025-07-17City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning2025-07-17Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17