AI Dynamics

Global AI News Aggregator

About

Tokenization issues and GPT-4 digit tokenization

Bytes are still tokens. You’d still have tokenization issues, just new and likely worse ones. Note digits are tokenized specially in GPT-4 as one token per pair. Even in that special case they didn’t go down to single digits.

→ View original post on X — @goodside,