Yes. Originally GPUs were designed for 32 bit precision, but in transformer contexts, you get away with even lower precision (16 bit is the standard usually). In recent months, people pushed that even further to FP8. One of the most recent architectures that successfully did FP8
GPU Precision Evolution: From 32-bit to FP8 in Transformers
By
–
Leave a Reply