AI Dynamics

Global AI News Aggregator

About

Custom Taalas hardware runs Llama-3.1-8B at 17k tokens per second

Custom hardware from Taalas runs Llama-3.1-8B, at 17k tokens per second
17k Absolutely insane (For the record Cerebras is crazy good and they're at 2k on the same model)
And latency is very low too! Their chatbot is here: https://
chatjimmy.ai It's genuinely a eerie

→ View original post on X — @aymericroucher