AI Dynamics

Global AI News Aggregator

About

4-bit Quantization Fixes Tool Calling Performance Issues

Dan found that the 2-bit quantization broke tool calling but upgrading to 4-bit (at 4.36 tokens/second) got that working

→ View original post on X — @simonw,