The reason LLMs say there's two r's in "strawberry" isn't (just) tokenization — they struggle with counting generally, e.g. "horse" in the example shown. The paper in the quoted post below offers the best intuition I've seen for why this happens: Transformers can't count because
