LLMs are hungry beasts—serving them fast is expensive. But what if we could cut the cost without cutting corners?
Tilus, a new GPU virtual machine does just that—by unleashing low-precision math at any bit width, not just powers of 2.
Tilus GPU VM cuts LLM serving costs with low-precision math
By
–
