Dan found that the 2-bit quantization broke tool calling but upgrading to 4-bit (at 4.36 tokens/second) got that working
4-bit Quantization Fixes Tool Calling Performance Issues
By
–

By
–

Dan found that the 2-bit quantization broke tool calling but upgrading to 4-bit (at 4.36 tokens/second) got that working