All models struggle in this benchmark because languages are: Brainfuck, Whitespace, Unlambda, Shakespeare. 😅
— Alex J. Champandard 🌱 (@alexjc) 19 mars 2026
If you actually pick a useful but still esoteric language like Joy, the frontier models do great (they *can* reason), but the open source ones struggle (they memorize). https://t.co/E5Mozy0yEB
All models struggle in this benchmark because languages are: Brainfuck, Whitespace, Unlambda, Shakespeare. If you actually pick a useful but still esoteric language like Joy, the frontier models do great (they *can* reason), but the open source ones struggle (they memorize).