I guess this is where we disagree about whether an error of 3261X in estimating the cost of training a language model, or conflating a completely different one-time task to perform a neural architecture search (and overestimating that cost by 88X) to find more energy efficient
Training Cost Estimation Errors in Large Language Models
By
–