TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Adaptive Dropout

Adaptive Dropout

GeneralIntroduced 200015 papers
Source Paper

Description

Adaptive Dropout is a regularization technique that extends dropout by allowing the dropout probability to be different for different units. The intuition is that there may be hidden units that can individually make confident predictions for the presence or absence of an important feature or combination of features. Dropout will ignore this confidence and drop the unit out 50% of the time.

Denote the activity of unit jjj in a deep neural network by a_ja\_{j}a_j and assume that its inputs are {a_i:i<ja\_{i}: i < ja_i:i<j}. In dropout, a_ja\_{j}a_j is randomly set to zero with probability 0.5. Let m_jm\_{j}m_j be a binary variable that is used to mask, the activity a_ja\_{j}a_j, so that its value is:

a_j=m_jg(∑_i:i<jw_j,ia_i) a\_{j} = m\_{j}g \left( \sum\_{i: i<j}w\_{j, i}a\_{i} \right)a_j=m_jg(∑_i:i<jw_j,ia_i)

where w_j,iw\_{j,i}w_j,i is the weight from unit iii to unit jjj and g(⋅)g\left(·\right)g(⋅) is the activation function and a_0=1a\_{0} = 1a_0=1 accounts for biases. Whereas in standard dropout, m_jm\_{j}m_j is Bernoulli with probability 0.50.50.5, adaptive dropout uses adaptive dropout probabilities that depends on input activities:

P(m_j=1∣{a_i:i<j})=f(∑_i:i<jπ_j,ia_i)P\left(m\_{j} = 1\mid{\{a\_{i}: i < j\}}\right) = f \left( \sum\_{i: i<j}\pi{\_{j, i}a\_{i}} \right)P(m_j=1∣{a_i:i<j})=f(∑_i:i<jπ_j,ia_i)

where π_j,i\pi\_{j, i}π_j,i is the weight from unit iii to unit jjj in the standout network or the adaptive dropout network; f(⋅)f(·)f(⋅) is a sigmoidal function. Here 'standout' refers to a binary belief network is that is overlaid on a neural network as part of the overall regularization technique.

Papers Using This Method

Efficient Federated Learning with Heterogeneous Data and Adaptive Dropout2025-07-14A statistical physics framework for optimal learning2025-07-10DOTA: Deformable Optimized Transformer Architecture for End-to-End Text Recognition with Retrieval-Augmented Generation2025-05-07Beyond Overfitting: Doubly Adaptive Dropout for Generalizable AU Detection2025-03-12Dynamic DropConnect: Enhancing Neural Network Robustness through Adaptive Edge Dropping Strategies2025-02-272-Tier SimCSE: Elevating BERT for Robust Sentence Embeddings2025-01-23Adaptive Dropout: Unleashing Dropout across Layers for Generalizable Image Super-Resolution2025-01-01Adaptive Dropout for Pruning Conformers2024-12-06Communication-Efficient Split Learning via Adaptive Feature-Wise Compression2023-07-20FedBIAD: Communication-Efficient and Accuracy-Guaranteed Federated Learning with Bayesian Inference-Based Adaptive Dropout2023-07-14The Flip Side of the Reweighted Coin: Duality of Adaptive Dropout and Regularization2021-06-14Advanced Dropout: A Model-free Methodology for Bayesian Dropout Optimization2020-10-11Adaptive Low-Rank Factorization to regularize shallow and deep neural networks2020-05-05Improved Dropout for Shallow and Deep Learning2016-02-06Adaptive dropout for training deep neural networks2013-12-01