AI Dynamics

Global AI News Aggregator

About

LLM Tokenization: Not Vectors or Letters, but Token Sequences

It doesn’t see words as embedded vectors or as letters, it sees them as token sequences — chunks of about 4 letters from a fixed mapping of strings to tokens. It does better on letter-based tasks if you ask it to first rewrite words l-i-k-e t-h-i-s.

→ View original post on X — @goodside