AI Dynamics

Global AI News Aggregator

About

CodeLlama PR merged with 4-bit quantization inference benchmarks

The CodeLlama PR just got merged: https://
github.com/Lightning-AI/l
it-gpt/pull/472
… When I tried it with bnb's 4-bit Normal Float quantization, the 34B Instruct and Python variants used about 20 Gb for inference:

→ View original post on X — @rasbt