Just a guess but maybe “ouches” and “unction” are strings just below some threshold of probability of following a space to earn a token? E.g. “unction” would be from function names (where “f” is swallowed by an unseparated prefix) and “ouch” often follows an open-quote or dash.
Speculation on token probabilities and tokenization artifacts
By
–