
+1 for this approach. GPT-3’s tokenization is a big gotcha. Model doesn’t see text as sequences of characters unless you add spacing/hyphens. It can’t even reliably tell you the last letter of a word without first rewriting it that way. Similar for counting length:
