Papers With Code 2 | ML Benchmarks, SotA Results & Code

Description

Ternary Weight Splitting is a ternarization approach used in BinaryBERT that exploits the flatness of ternary loss landscape as the optimization proxy of the binary model. We first train the half-sized ternary BERT to convergence, and then split both the latent full-precision weight $\mathbf{w}^{t}$ and quantized $\hat{\mathbf{w}}^{t}$ to their binary counterparts $\mathbf{w}\_{1}^{b}, \mathbf{w}\_{2}^{b}$ and $\hat{\mathbf{w}}\_{1}^{b}, \hat{\mathbf{w}}\_{2}^{b}$ via the TWS operator. To inherit the performance of the ternary model after splitting, the TWS operator requires the splitting equivalency (i.e., the same output given the same input):

\mathbf{w}^{t}=\mathbf{w}\_{1}^{b}+\mathbf{w}\_{2}^{b}, \quad \hat{\mathbf{w}}^{t}=\hat{\mathbf{w}}\_{1}^{b}+\hat{\mathbf{w}}\_{2}^{b}

While solution to the above equation is not unique, we constrain the latent full-precision weights after splitting $\mathbf{w}\_{1}^{b}, \mathbf{w}\_{2}^{b}$ to satisfy $\mathbf{w}^{t}=\mathbf{w}\_{1}^{b}+\mathbf{w}\_{2}^{b}$ . See the paper for more details.

Description

\mathbf{w}^{t}=\mathbf{w}\_{1}^{b}+\mathbf{w}\_{2}^{b}, \quad \hat{\mathbf{w}}^{t}=\hat{\mathbf{w}}\_{1}^{b}+\hat{\mathbf{w}}\_{2}^{b}

Ternary Weight Splitting

Description

Papers Using This Method

Ternary Weight Splitting

Description

Papers Using This Method