AI Dynamics

Global AI News Aggregator

Groq LPU Engine: Kernel-free Llama 2 70B Inference Performance

Kernel-free, no Cuda, Compiler-only solution.
The current implementation, running on our LPU Inference Engine uses our own processor. The model is Llama 2, 70B 4k sequence at FP16…We haven't even hit the next gear through those other methods. Plenty in the tank left!

→ View original post on X — @groqinc,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *