Calculations:
LLaMa2-70B was trained on 2T tokens, and 3.3m hours of A100 GPU time. At a HFU of ~60%, LLama2-70B took ~4.4e+24 flops (3.3m * 624TFlops bfloat16 * 60% HFU).
LLaMa2-70B Training Costs: Computing Efficiency Analysis
By
–
Global AI News Aggregator
By
–
Calculations:
LLaMa2-70B was trained on 2T tokens, and 3.3m hours of A100 GPU time. At a HFU of ~60%, LLama2-70B took ~4.4e+24 flops (3.3m * 624TFlops bfloat16 * 60% HFU).
Leave a Reply