Some people do take “LLMs see tokens not letters” at face value — i.e. the mental model is LLMs can of course count repetitions of a *token*, but a naive tokenizer makes letters an unnecessarily difficult task. Misgeneralization during training is different and more plausible.