oLLM is a lightweight Python library for local large-context LLM inference. Run gpt-oss-20B, Qwen3-next-80B, Llama-3.1-8B on ~$200 consumer GPU with just 8GB VRAM. And this is without any quantization – only fp16/bf16 precision. 100% Opensource.
oLLM: Lightweight Library for Local LLM Inference
By
–
Leave a Reply