The CodeLlama PR just got merged: https://
github.com/Lightning-AI/l
it-gpt/pull/472
… When I tried it with bnb's 4-bit Normal Float quantization, the 34B Instruct and Python variants used about 20 Gb for inference:
CodeLlama PR merged with 4-bit quantization inference benchmarks
By
–
Leave a Reply