10/ FP8-LM – finds that when training FP8 LLMs most variables, such as gradients and optimizer states, in LLM training, can employ low-precision data formats without compromising model accuracy and requiring no changes to hyper-parameter.
FP8 Low-Precision Training Improves LLM Efficiency Without Hyperparameter Changes
By
–
