TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Ternary Weight Splitting

Ternary Weight Splitting

GeneralIntroduced 20002 papers
Source Paper

Description

Ternary Weight Splitting is a ternarization approach used in BinaryBERT that exploits the flatness of ternary loss landscape as the optimization proxy of the binary model. We first train the half-sized ternary BERT to convergence, and then split both the latent full-precision weight wt\mathbf{w}^{t}wt and quantized w^t\hat{\mathbf{w}}^{t}w^t to their binary counterparts w_1b,w_2b\mathbf{w}\_{1}^{b}, \mathbf{w}\_{2}^{b}w_1b,w_2b and w^_1b,w^_2b\hat{\mathbf{w}}\_{1}^{b}, \hat{\mathbf{w}}\_{2}^{b}w^_1b,w^_2b via the TWS operator. To inherit the performance of the ternary model after splitting, the TWS operator requires the splitting equivalency (i.e., the same output given the same input):

wt=w_1b+w_2b,w^t=w^_1b+w^_2b\mathbf{w}^{t}=\mathbf{w}\_{1}^{b}+\mathbf{w}\_{2}^{b}, \quad \hat{\mathbf{w}}^{t}=\hat{\mathbf{w}}\_{1}^{b}+\hat{\mathbf{w}}\_{2}^{b}wt=w_1b+w_2b,w^t=w^_1b+w^_2b

While solution to the above equation is not unique, we constrain the latent full-precision weights after splitting w_1b,w_2b\mathbf{w}\_{1}^{b}, \mathbf{w}\_{2}^{b}w_1b,w_2b to satisfy wt=w_1b+w_2b\mathbf{w}^{t}=\mathbf{w}\_{1}^{b}+\mathbf{w}\_{2}^{b}wt=w_1b+w_2b. See the paper for more details.

Papers Using This Method

Co-Designing Binarized Transformer and Hardware Accelerator for Efficient End-to-End Edge Deployment2024-07-16BinaryBERT: Pushing the Limit of BERT Quantization2020-12-31