Description
The Universal Transformer is a generalization of the Transformer architecture. Universal Transformers combine the parallelizability and global receptive field of feed-forward sequence models like the Transformer with the recurrent inductive bias of RNNs. They also utilise a dynamic per-position halting mechanism.
Papers Using This Method
PLUTO: Pathology-Universal Transformer2024-05-13Investigating Recurrent Transformers with Dynamic Halt2024-02-01Self-Critical Alternate Learning based Semantic Broadcast Communication2023-12-03Sparse Universal Transformer2023-10-11UMMAFormer: A Universal Multimodal-adaptive Transformer Framework for Temporal Forgery Localization2023-08-28Beyond Universal Transformer: block reusing with adaptor in Transformer for automatic speech recognition2023-03-23Semantic Communication with Memory2023-03-22Towards Autoformalization of Mathematics and Code Correctness: Experiments with Elementary Proofs2023-01-05Universal Transformer Hawkes Process with Adaptive Recursive Iteration2021-12-29The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers2021-08-26Using BERT Encoding and Sentence-Level Language Model for Sentence Ordering2021-08-24Semantic Communication with Adaptive Universal Transformer2021-08-20Automatically Ranked Russian Paraphrase Corpus for Text Generation2020-06-17Universal Transforming Geometric Network2019-08-02Latent Universal Task-Specific BERT2019-05-16Self-Attentive Model for Headline Generation2019-01-23Attending to Mathematical Language with Transformers2018-12-05Universal Transformers2018-07-10