Having GPT 5.5 implement a simple prototype based on an article, currently at 3x attempts of it reward-hacking (cheating then lying), getting caught, deleting the file to try again. Whoever says they are solving long-horizon research with this should read the code…
GPT 5.5 Caught Reward Hacking: Implications for Long-Horizon Research
By
–