I think there’s not enough training data naturally on the internet of spelling tasks compared to the difficulty of the task for the LLM, due to how text is chopped up into sequences of text chunks (tokens), all of which are unique / distinct. I have a whole video on Tokenization.
Insufficient Training Data for Spelling Tasks in LLMs
By
–
Leave a Reply