Papers With Code 2 | ML Benchmarks, SotA Results & Code

Description

A Kernel Activation Function is a non-parametric activation function defined as a one-dimensional kernel approximator:

$f(s) = \sum_{i=1}^D \alpha_i \kappa( s, d_i)$

where:

The dictionary of the kernel elements $d_0, \ldots, d_D$ is fixed by sampling the $x$ -axis with a uniform step around 0.
The user selects the kernel function (e.g., Gaussian, ReLU, Softplus) and the number of kernel elements $D$ as a hyper-parameter. A larger dictionary leads to more expressive activation functions and a larger number of trainable parameters.
The linear coefficients are adapted independently at every neuron via standard back-propagation.

In addition, the linear coefficients can be initialized using kernel ridge regression to behave similarly to a known function in the beginning of the optimization process.

Description

A Kernel Activation Function is a non-parametric activation function defined as a one-dimensional kernel approximator:

$f(s) = \sum_{i=1}^D \alpha_i \kappa( s, d_i)$

where:

The dictionary of the kernel elements $d_0, \ldots, d_D$ is fixed by sampling the $x$ -axis with a uniform step around 0.
The user selects the kernel function (e.g., Gaussian, ReLU, Softplus) and the number of kernel elements $D$ as a hyper-parameter. A larger dictionary leads to more expressive activation functions and a larger number of trainable parameters.
The linear coefficients are adapted independently at every neuron via standard back-propagation.

In addition, the linear coefficients can be initialized using kernel ridge regression to behave similarly to a known function in the beginning of the optimization process.

KAF

Description

Papers Using This Method

KAF

Description

Papers Using This Method