It's not just about compute though, the tokens have some meaning to them. It doesn't work to just get the model to generate arbitrary tokens to increase compute. I would be skeptical of this working even if you trained the model on such examples.
Compute scaling requires meaningful tokens, not arbitrary generation
By
–