AI Dynamics

Global AI News Aggregator

About

Trading Throughput for Latency: LLM Performance Optimization

2. Sacrificing throughput for latency
Originally, they focused on TTFT (Time To First Token), but realized that TBT (Time Between Token) hurt them more, especially with Chain-of-Thought queries where users don’t see the intermediate outputs. They found that TTFT and TBT

→ View original post on X — @chipro,