This new paper suggests that LLM ‘aha moments’ arise from an emergent planning-vs-execution hierarchy, similar to HRM’s slow-planner/fast-executor idea So they proposed HICRA which amplifies per-token credit on scarce planning tokens, focusing strategy & often beating GRPO!
LLM Planning-Execution Hierarchy Emerges Through Credit Assignment
By
–
