Description
Unsupervised knowledge distillation from a pretrained language model to itself, by alternating between its bi- and cross-encoder forms.
Unsupervised knowledge distillation from a pretrained language model to itself, by alternating between its bi- and cross-encoder forms.