Ace. oLLM is a lightweight Python library for LLM inference built on top of transformers Run qwen3-next-80B, GPT-OSS, Llama3, on consumer hardware. ↳ Handle 100k tokens on an 8GB GPU
↳ Works with contracts, logs, reports
↳ No quantization, just fp16/bf16 Repo in ↓
oLLM: Lightweight Python Library for LLM Inference
By
–
Leave a Reply