AI Dynamics

Global AI News Aggregator

BF16 to FP8 Quantization: Per-Channel Scaling for LLM Accuracy

The tricky part: naïvely casting BF16 group scales to FP8 dropped the quality. Our fix: quantize scales per-channel (outer vector scaling) + rescale by 1/8 to avoid FP8 clipping. Result: >99.5% of W4A16 accuracy recovered on Command A & Cohere MoE. Paired with a CUTLASS

→ View original post on X — @cohere,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *