BERT-Large 32k batch size with AdamW

Reported on 1 benchmark across 1 task · 1 paper

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing1 result

Question AnsweringonSQuAD1.1
F1· 2021-02-12
91.58
best: 95.719 ({ANNA} (single model))
A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch Sizes arXiv:2102.06356