Given the new result, I see an extension to proving conditions under which gradient descent can learn general functions on two layer neural network based on score function tensor formulation we developed, which is generalization of Hermite polynomials for Gaussians.