Is it confirmed that they used more GPU hours? Sure, they have that 200k GPUs cluster, but did they use all that for the final model run or is it more to facilitate multiple parallel experiments?
GPU hours usage confirmation and cluster utilization questions
By
–