Kog has open-sourced on @huggingface the 2B model they used to demonstrate a model running at over 3,000 tokens per second. Very cool work! https://huggingface.co/blog/kogai/kog-laneformer-2b-the-latency-first-model …
Kog publishes an ultra-fast 2B model on Hugging Face
By
–
