2. Sacrificing throughput for latency
Originally, they focused on TTFT (Time To First Token), but realized that TBT (Time Between Token) hurt them more, especially with Chain-of-Thought queries where users don’t see the intermediate outputs. They found that TTFT and TBT
Trading Throughput for Latency: LLM Performance Optimization
By
–