Dan found that the 2-bit quantization broke tool calling but upgrading to 4-bit (at 4.36 tokens/second) got that working
4-bit Quantization Fixes Tool Calling Performance Issues
By
–
Global AI News Aggregator
By
–
Dan found that the 2-bit quantization broke tool calling but upgrading to 4-bit (at 4.36 tokens/second) got that working
Leave a Reply