AI Dynamics

Global AI News Aggregator

About

Llama 3.2 Quantized Versions Speed Up Inference 2-4x

We want to make it easier for more people to build with Llama — so today we’re releasing new quantized versions of Llama 3.2 1B & 3B that deliver up to 2-4x increases in inference speed and, on average, 56% reduction in model size, and 41% reduction in memory footprint.
Details

→ View original post on X — @aiatmeta