Description
NT-ASGD, or Non-monotonically Triggered ASGD, is an averaged stochastic gradient descent technique.
In regular ASGD, we take steps identical to regular SGD but instead of returning the last iterate as the solution, we return , where is the total number of iterations and is a user-specified averaging trigger.
NT-ASGD has a non-monotonic criterion that conservatively triggers the averaging when the validation metric fails to improve for multiple cycles. Given that the choice of triggering is irreversible, this conservatism ensures that the randomness of training does not play a major role in the decision.
Papers Using This Method
Probing for Referential Information in Language Models2020-07-01MaxUp: A Simple Way to Improve Generalization of Neural Network Training2020-02-20DeFINE: DEep Factorized INput Token Embeddings for Neural Sequence Modeling2019-11-27A Subword Level Language Model for Bangla Language2019-11-15Language Informed Modeling of Code-Switched Text2018-07-01Regularizing and Optimizing LSTM Language Models2017-08-07