A Kernel Activation Function is a non-parametric activation function defined as a one-dimensional kernel approximator:
f(s)=∑i=1Dαiκ(s,di)
where:
- The dictionary of the kernel elements d0,…,dD is fixed by sampling the x-axis with a uniform step around 0.
- The user selects the kernel function (e.g., Gaussian, ReLU, Softplus) and the number of kernel elements D as a hyper-parameter. A larger dictionary leads to more expressive activation functions and a larger number of trainable parameters.
- The linear coefficients are adapted independently at every neuron via standard back-propagation.
In addition, the linear coefficients can be initialized using kernel ridge regression to behave similarly to a known function in the beginning of the optimization process.