All models are bf16. re: context, I know in past we have had restricted usage during spikes, would need to double check for 3.3. We do try to maximise the context length where possible. btw all powered via TGI v3:
All Models BF16: Context Length Maximization via TGI v3
By
–
Leave a Reply