Yeah, this was at 50k contexts. Decode is about 80 tok/sec at 1k contexts. Prefill is up to 3000 tok/sec at
Decode 80 tok/sec at 1k, prefill 3000 tok/sec at 50k contexts
By
–
By
–
Yeah, this was at 50k contexts. Decode is about 80 tok/sec at 1k contexts. Prefill is up to 3000 tok/sec at