TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Cosine Normalization

Cosine Normalization

GeneralIntroduced 20002 papers
Source Paper

Description

Multi-layer neural networks traditionally use dot products between the output vector of previous layer and the incoming weight vector as the input to activation function. The result of dot product is unbounded. To bound dot product and decrease the variance, Cosine Normalization uses cosine similarity or centered cosine similarity (Pearson Correlation Coefficient) instead of dot products in neural networks.

Using cosine normalization, the output of a hidden unit is computed by:

o=f(netnorm)=f(cos⁡θ)=f(w⃗⋅x⃗∣w⃗∣∣x⃗∣)o = f(net_{norm})= f(\cos \theta) = f(\frac{\vec{w} \cdot \vec{x}} {\left|\vec{w}\right| \left|\vec{x}\right|})o=f(netnorm​)=f(cosθ)=f(∣w∣∣x∣w⋅x​)

where netnormnet_{norm}netnorm​ is the normalized pre-activation, w⃗\vec{w}w is the incoming weight vector and x⃗\vec{x}x is the input vector, (⋅\cdot⋅) indicates dot product, fff is nonlinear activation function. Cosine normalization bounds the pre-activation between -1 and 1.

Papers Using This Method

Class-incremental Learning with Rectified Feature-Graph Preservation2020-12-15Cosine Normalization: Using Cosine Similarity Instead of Dot Product in Neural Networks2017-02-20