“Adaptive Block-Scaled Data Types” A lot of 4-bit LLM quantization assumes every block should use the same number format, this paper argues that’s the wrong abstraction. So they let each 16-value block choose between FP4 and scaled INT4, depending on which gives lower error. This format, IF4 (Int/Float 4), reuses the unused sign bit of the shared FP8 scale factor to store the choice, so it gets adaptivity with no extra storage overhead. This is because at 4 bits, precision is so scarce that matching the format to the local value distribution really matters. The result is lower quantization error and better performance than existing 4-bit block-scaled formats in both training and PTQ.
→ View original post on X — @askalphaxiv, 2026-04-02 06:54 UTC

Leave a Reply