why this breaks everything: RL progress has been bottlenecked by human intuition. researchers have insights, try variations, publish. it takes years to go from Q-learning to DQN to PPO. now you just let the machine search directly. millions of variants in weeks instead of