And if you are looking for inference mode quantization, you can use `–quantize "bnb.nf4" with the "generate/base.py" scripts as well.
I am currently using that for running CodeLlama 34B models.
Inference Mode Quantization with BNB NF4 for CodeLlama
By
–
Leave a Reply