Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le

2020-06-05NeurIPS 2020 12Text Classification Reading Comprehension

Abstract

With the success of language pretraining, it is highly desirable to develop more efficient architectures of good scalability that can exploit the abundant unlabeled data at a lower cost. To improve the efficiency, we examine the much-overlooked redundancy in maintaining a full-length token-level presentation, especially for tasks that only require a single-vector presentation of the sequence. With this intuition, we propose Funnel-Transformer which gradually compresses the sequence of hidden states to a shorter one and hence reduces the computation cost. More importantly, by re-investing the saved FLOPs from length reduction in constructing a deeper or wider model, we further improve the model capacity. In addition, to perform token-level predictions as required by common pretraining objectives, Funnel-Transformer is able to recover a deep representation for each token from the reduced hidden sequence via a decoder. Empirically, with comparable or fewer FLOPs, Funnel-Transformer outperforms the standard Transformer on a wide variety of sequence-level prediction tasks, including text classification, language understanding, and reading comprehension. The code and pretrained checkpoints are available at https://github.com/laiguokun/Funnel-Transformer.

Results

Task	Dataset	Metric	Value	Model
Reading Comprehension	RACE	Accuracy	85.7	B10-10-10
Reading Comprehension	RACE	Accuracy (High)	84.4	B10-10-10
Reading Comprehension	RACE	Accuracy (Middle)	88.8	B10-10-10

Related Papers

Making Language Model a Hierarchical Classifier and Generator2025-07-17 GNN-CNN: An Efficient Hybrid Model of Convolutional and Graph Neural Networks for Text Representation2025-07-10 DeRIS: Decoupling Perception and Cognition for Enhanced Referring Image Segmentation through Loopback Synergy2025-07-02 The Trilemma of Truth in Large Language Models2025-06-30 Robustness of Misinformation Classification Systems to Adversarial Examples Through BeamAttack2025-06-30 Perspectives in Play: A Multi-Perspective Approach for More Inclusive NLP Systems2025-06-25 Can Generated Images Serve as a Viable Modality for Text-Centric Multimodal Learning?2025-06-21 SHREC and PHEONA: Using Large Language Models to Advance Next-Generation Computational Phenotyping2025-06-19