Derivative of Relu

Nov 26, 2017 |

The Relu (Rectified Linear Unit) activation function is:

$f(z) = max(0, z)$

The derivative is as follows:

$f'(z) = = \left\{ \begin{array}{ll} 1 \text{\ \ if \ } x>0}\\ 0 \text{\ \ otherwise}\\ \end{array} \right.$

Technically, $f’(x=0)$ is not defined. The subgradient tells us that:

$[ f'(x < 0)=0 ] \leq f'(x=0) \leq [ f'(x > 0)=1 ]$

Typically, we choose $f’(x=0) = 0$.

This has the nice property of favouring sparsity in the feature map.

Additoionally, we can also choose $f’(x=0) = 0.5, 1$