do not use Ollama ggerganov wrote blazing-fast
C++ inference (ggml, llama.cpp) then Ollama wrapped it
in a bloated binary and is now somehow the face of local LLMs
soaking up VC hype and it's not even a good wrapper lol
Ollama’s bloated wrapper fails to match ggml’s efficiency
By
–