Bytes are still tokens. You’d still have tokenization issues, just new and likely worse ones. Note digits are tokenized specially in GPT-4 as one token per pair. Even in that special case they didn’t go down to single digits.
By
–
Bytes are still tokens. You’d still have tokenization issues, just new and likely worse ones. Note digits are tokenized specially in GPT-4 as one token per pair. Even in that special case they didn’t go down to single digits.