AI Dynamics

Global AI News Aggregator

About

On LLM tokenization and misgeneralization

Some people do take “LLMs see tokens not letters” at face value — i.e. the mental model is LLMs can of course count repetitions of a *token*, but a naive tokenizer makes letters an unnecessarily difficult task. Misgeneralization during training is different and more plausible.

→ View original post on X — @goodside