The 6 "modes" of LLM inference, over time: – 2021: "large" models
– 2023: "turbo"/"mini" models
– Apr 2024: Batch API
– Sep 2024: Reasoning models
– Oct 2024: Realtime API
– Nov 2024: Speculative Decoding APIs seems pretty comprehensive. what else will be coming?
Six LLM Inference Modes Evolution and Future Trends
By
–
Leave a Reply