Yeah, I get it. Just like pass-k also improves things a lot when you increase k, predictably so! Thinking a compute budget is the best compromise in this case, as it's a bit more grounded & less biased than token counts…
Compute Budget vs Token Count: Scaling Model Performance
By
–