Our LPU™ system is pushing the limits on LLM #inference perf again, now running Llama-2 70B at 240 tokens per sec per user! CEO @JonathanRoss321 shares more on the >2x improvement, why ultra-low latency matters, and if GPUs can still catch up. More at http://
groq.link/240tps
Groq LPU System Achieves 240 Tokens Per Second with Llama-2
By
–
Leave a Reply