AI Dynamics

Global AI News Aggregator

About

Tiktoken Tokenization: Why Not Handle Unknown Characters Gracefully?

Something I don't understand here is that surely it would make sense to maintain the ability to tokenize as a sequence of integer characters representing the pieces that make up that string, rather than throwing an error? Any idea why tiktoken doesn't do that?

→ View original post on X — @simonw