Description
Variational Dropout is a regularization technique based on dropout, but uses a variational inference grounded approach. In Variational Dropout, we repeat the same dropout mask at each time step for both inputs, outputs, and recurrent layers (drop the same network units at each time step). This is in contrast to ordinary Dropout where different dropout masks are sampled at each time step for the inputs and outputs alone.
Papers Using This Method
RLBenchNet: The Right Network for the Right Reinforcement Learning Task2025-05-21Advanced Deep Learning Techniques for Analyzing Earnings Call Transcripts: Methodologies and Applications2025-02-27BARNN: A Bayesian Autoregressive and Recurrent Neural Network2025-01-30A Combined Encoder and Transformer Approach for Coherent and High-Quality Text Generation2024-11-19No Argument Left Behind: Overlapping Chunks for Faster Processing of Arbitrarily Long Legal Texts2024-10-24Large Body Language Models2024-10-21RICo: Reddit ideological communities2024-06-05Transformers for Supervised Online Continual Learning2024-03-03UniMem: Towards a Unified View of Long-Context Large Language Models2024-02-05Exploring Multi-Level Threats in Telegram Data with AI-Human Annotation: A Preliminary Study2023-12-15Illicit Darkweb Classification via Natural-language Processing: Classifying Illicit Content of Webpages based on Textual Information2023-12-08Memory-efficient Stochastic methods for Memory-based Transformers2023-11-14TRAMS: Training-free Memory Selection for Long-range Language Modeling2023-10-24Approximating Two-Layer Feedforward Networks for Efficient Transformers2023-10-16Memory Gym: Towards Endless Tasks to Benchmark Memory Capabilities of Agents2023-09-29Random-Access Infinite Context Length for Transformers2023-09-21RCMHA: Relative Convolutional Multi-Head Attention for Natural Language Modelling2023-08-07Landmark Attention: Random-Access Infinite Context Length for Transformers2023-05-25Sparsified Model Zoo Twins: Investigating Populations of Sparsified Neural Network Models2023-04-26Transformer-based World Models Are Happy With 100k Interactions2023-03-13