Most LLM apps and AI Agents need multiple calls to an LLM, especially a moderately complex LLM app/AI agent. Calling GPT-4 or Claude is impractical and you will soon be in high-latency hell. The optimal way to do this is to mix and match LLMs depending on the latency,
Optimize Multiple LLM Calls by Mixing Models for Lower Latency
By
–
Leave a Reply