A key insight from chain-of-thought is around the idea of information density. Language models can only do so much with a single forward pass, and so the amount of compute the language model can use must be scaled proportional to how hard a prompt is to solve. What is
Chain-of-Thought Information Density and Compute Scaling
By
–