The Relu (Rectified Linear Unit) activation function is:
The derivative is as follows:
Technically, $f’(x=0)$ is not defined. The subgradient tells us that:
Typically, we choose $f’(x=0) = 0$.
This has the nice property of favouring sparsity in the feature map.
Additoionally, we can also choose $f’(x=0) = 0.5, 1$