UTF-8 I already knew about the "confusables", e.g.: e vs. е. Which look ~same but are different. But you can also smuggle arbitrary byte streams in any character via "variation selectors". So this emoji: 󠅧󠅕󠄐󠅑󠅢󠅕󠄐󠅓󠅟󠅟󠅛󠅕󠅔 is 53 tokens. Yay https://
paulbutler.org/2025/smuggling
-arbitrary-data-through-an-emoji/
…
Unicode Variation Selectors Enable Arbitrary Data Smuggling in Emojis
By
–
Leave a Reply