This feature really does change things. Gemini had cache, but only for costs — Claude claims 90% less cost *and* 85% less latency for cached tokens. Even if it can’t replace all SFT, it’s so much easier to iterate on you probably want to exhaust caching-based strategies first.
Models using cached tokens greatly reduce cost and latency
By
–