AI Dynamics

Global AI News Aggregator

CodeLlama PR merged with 4-bit quantization inference benchmarks

The CodeLlama PR just got merged: https://
github.com/Lightning-AI/l
it-gpt/pull/472
… When I tried it with bnb's 4-bit Normal Float quantization, the 34B Instruct and Python variants used about 20 Gb for inference:

→ View original post on X — @rasbt,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *