If uberinternal WAS a token that tells you that it was one of the top ~30,000 character sequences present in the text they used to build the tokenizer – which is a different corpus from the training set used to train the model uberinternal not being a token doesn't tell you much
Tokenizer Training Data: Understanding Token Presence Significance
By
–