Variational Dropout

GeneralIntroduced 2000141 papers

Description

Variational Dropout is a regularization technique based on dropout, but uses a variational inference grounded approach. In Variational Dropout, we repeat the same dropout mask at each time step for both inputs, outputs, and recurrent layers (drop the same network units at each time step). This is in contrast to ordinary Dropout where different dropout masks are sampled at each time step for the inputs and outputs alone.

Papers Using This Method

RLBenchNet: The Right Network for the Right Reinforcement Learning Task2025-05-21 Advanced Deep Learning Techniques for Analyzing Earnings Call Transcripts: Methodologies and Applications2025-02-27 BARNN: A Bayesian Autoregressive and Recurrent Neural Network2025-01-30 A Combined Encoder and Transformer Approach for Coherent and High-Quality Text Generation2024-11-19 No Argument Left Behind: Overlapping Chunks for Faster Processing of Arbitrarily Long Legal Texts2024-10-24 Large Body Language Models2024-10-21 RICo: Reddit ideological communities2024-06-05 Transformers for Supervised Online Continual Learning2024-03-03 UniMem: Towards a Unified View of Long-Context Large Language Models2024-02-05 Exploring Multi-Level Threats in Telegram Data with AI-Human Annotation: A Preliminary Study2023-12-15 Illicit Darkweb Classification via Natural-language Processing: Classifying Illicit Content of Webpages based on Textual Information2023-12-08 Memory-efficient Stochastic methods for Memory-based Transformers2023-11-14 TRAMS: Training-free Memory Selection for Long-range Language Modeling2023-10-24 Approximating Two-Layer Feedforward Networks for Efficient Transformers2023-10-16 Memory Gym: Towards Endless Tasks to Benchmark Memory Capabilities of Agents2023-09-29 Random-Access Infinite Context Length for Transformers2023-09-21 RCMHA: Relative Convolutional Multi-Head Attention for Natural Language Modelling2023-08-07 Landmark Attention: Random-Access Infinite Context Length for Transformers2023-05-25 Sparsified Model Zoo Twins: Investigating Populations of Sparsified Neural Network Models2023-04-26 Transformer-based World Models Are Happy With 100k Interactions2023-03-13