I do think that better compaction and teaching the models to re-learn context post compaction if they are unsure solves the need for really long context windows to an extent. That said, GPT 5.2 and the codex variants both support 400K context window which is a step up.
