the non-linearity by itself isn't complex, something as simple as max(x, 0).
the system of two matmuls and a max in between, combined with gradient descent — that system becomes fairly complex.
Emergent Complexity in Neural Networks Through Gradient Descent
By
–