Something I don't understand here is that surely it would make sense to maintain the ability to tokenize as a sequence of integer characters representing the pieces that make up that string, rather than throwing an error? Any idea why tiktoken doesn't do that?
Tiktoken Tokenization: Why Not Handle Unknown Characters Gracefully?
By
–