If your learning algorithm is based on correlation rather than causation, it will struggle with overfitting. To understand something is to identify its minimal sufficient causal mechanisms. Parsimony isn't just elegance, it's generalization robustness.