AI Dynamics

Global AI News Aggregator

About

Llama2-7B Reaches 30 Tokens Per Second On-Device

Here we go. Llama2-7B at 30 tokens per second ON CHIP. Give it 2 more cycles and we’ll see the larger models on device at these speeds. LLMs will be as omnipresent as calculators, simple maths.

→ View original post on X — @linusekenstam