How do you quantize LLMs with tens of billions of parameters while avoiding the sharp drop in performance seen when quantizing models > 6B params?
— Cohere (@cohere) 19 décembre 2023
Listen to @aahmadian_ highlight a correct training optimization recipe (which leads to no drop in performance when quantizing vs. a… pic.twitter.com/iU1mliAjel
How do you quantize LLMs with tens of billions of parameters while avoiding the sharp drop in performance seen when quantizing models > 6B params? Listen to @aahmadian_ highlight a correct training optimization recipe (which leads to no drop in performance when quantizing vs. a