AI Dynamics

Global AI News Aggregator

About

GPT-Fast Integrates Gemma with Optimized Token Performance

+/-12 lines of code in gpt-fast integrated Gemma from @GoogleDeepMind ; and shows:
* 234 tokens / sec on V100 in int8
* 144 tokens / sec on V100 in float precision https://
github.com/pytorch-labs/g
pt-fast/commit/ef055fc12188eaf80d8ba948ad743ee5583d0f3c

→ View original post on X — @soumithchintala