From my perspective, the point of raising the example of natural selection is that it debunks the naive belief that if in general an outer optimizer trains on reward function, it gets an inner optimizer that pursues that reward function OOD. Saying "But SGD is first-order and
Outer Optimizers and Inner Optimizers: Beyond Naive Reward Function Pursuit
By
–
Leave a Reply