Derivative of Relu

The Relu (Rectified Linear Unit) activation function is:

The derivative is as follows:

Technically, $f’(x=0)$ is not defined. The subgradient tells us that:

Typically, we choose $f’(x=0) = 0$.

This has the nice property of favouring sparsity in the feature map.

Additoionally, we can also choose $f’(x=0) = 0.5, 1$