AI Dynamics

Global AI News Aggregator

About

LLMs maximize likelihood via cross-entropy training

Yes. For one, “Bayesian inference” is not Bayesian. Every (frequentist) n-gram model uses Bayes’ theorem. For another, LLMs have high capacity and are trained to minimize cross-entropy, which is equivalent to maximizing likelihood, so it’s not surprising they produce accurate

→ View original post on X — @pmddomingos,