You could 100% do that. Actually, I have a paragraph on that in the post: the prob is that if you have a text of 100 characters, that would be 100 tokens (instead of ~20-30 tokens). In other words, it would be wasteful because you won't be able to input longer texts into the LLM.
Character-level tokenization inefficiency and token limit constraints
By
–