Kernel-free, no Cuda, Compiler-only solution.
The current implementation, running on our LPU Inference Engine uses our own processor. The model is Llama 2, 70B 4k sequence at FP16…We haven't even hit the next gear through those other methods. Plenty in the tank left!
Groq LPU Engine: Kernel-free Llama 2 70B Inference Performance
By
–
Leave a Reply