AI Dynamics

Global AI News Aggregator

Unicode Variation Selectors Enable Arbitrary Data Smuggling in Emojis

UTF-8 I already knew about the "confusables", e.g.: e vs. е. Which look ~same but are different. But you can also smuggle arbitrary byte streams in any character via "variation selectors". So this emoji: 󠅧󠅕󠄐󠅑󠅢󠅕󠄐󠅓󠅟󠅟󠅛󠅕󠅔 is 53 tokens. Yay https://
paulbutler.org/2025/smuggling
-arbitrary-data-through-an-emoji/

→ View original post on X — @karpathy,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *