The Softmax output function transforms a previous layer's output into a vector of probabilities. It is commonly used for multiclass classification. Given an input vector xxx and a weighting vector www we have:
P(y=j∣x)=exTwj∑k=1KexTwkP(y=j \mid{x}) = \frac{e^{x^{T}w_{j}}}{\sum^{K}_{k=1}e^{x^{T}wk}}P(y=j∣x)=∑k=1KexTwkexTwj