"Language Models Need Sleep" Instead of thinking longer at answer time, this paper makes LLMs sleep before forgetting. They replay old context, write it into fast weights, clear the KV cache, and answer later at normal speed. More sleep improves deep reasoning over long and
