Performance degrades over the course of 100k tokens even, let alone the whole currently supported window… After a few turns of coding Python, it just can't reliably use its tools anymore. Requires constant jumping back and/or offloading.
LLM Performance Degradation Over Extended Context Windows
By
–