If this doesn’t make you bullish on Open Source – I don’t know what will! That’s a 32B LLM that can easily fit on a ~0.8 USD/ hour GPU – spitting ungodly num of tokens Back of the napkin math:
– fp16/ bf16 – 32GB VRAM (would fit on a L40S)
– 8-bit – 16GB VRAM (L4)
– 4-bit –
32B Open Source LLM Fits Affordable GPU Hardware
By
–
