5/5 What this unlocks: – Change one prompt, serve cached results for all other components.
– A/B test any component against identical upstream outputs.
– Run best-of-N inference without N branches collapsing to the same answer. Full design methodology here:
AI21 Labs unveils new prompt caching capabilities
By
–