It should work on CPU/ CUDA/ MPS across backends, w.r.t hardware requirements: 1B should take roughly 2GB VRAM to load in fp16/ bf16.
600M should take 1.2 GB VRAM
350M – ~700MB VRAM
125 – ~250MB VRAM Ofcourse at lower quants Q4/ Q8 you reduce this even further.
VRAM Requirements for AI Models Across Hardware Architectures
By
–
Leave a Reply