Papers With Code 2 | ML Benchmarks, SotA Results & Code

Description

Multi-layer neural networks traditionally use dot products between the output vector of previous layer and the incoming weight vector as the input to activation function. The result of dot product is unbounded. To bound dot product and decrease the variance, Cosine Normalization uses cosine similarity or centered cosine similarity (Pearson Correlation Coefficient) instead of dot products in neural networks.

Using cosine normalization, the output of a hidden unit is computed by:

$o = f(net_{norm})= f(\cos \theta) = f(\frac{\vec{w} \cdot \vec{x}} {\left|\vec{w}\right| \left|\vec{x}\right|})$

where $net_{norm}$ is the normalized pre-activation, $\vec{w}$ is the incoming weight vector and $\vec{x}$ is the input vector, ( $\cdot$ ) indicates dot product, $f$ is nonlinear activation function. Cosine normalization bounds the pre-activation between -1 and 1.

Description

Using cosine normalization, the output of a hidden unit is computed by:

$o = f(net_{norm})= f(\cos \theta) = f(\frac{\vec{w} \cdot \vec{x}} {\left|\vec{w}\right| \left|\vec{x}\right|})$

Cosine Normalization

Description

Papers Using This Method

Cosine Normalization

Description

Papers Using This Method