AI Dynamics

Global AI News Aggregator

Token Encoding and Decoding: Asymmetric Complexity in LLMs

decoding (tokens -> string) is just lookup table and string concat. encoding (string -> tokens) is a pain. For sentencepiece I *think* llama2.c has a simple implementation that probably works but I'm not 100% sure: https://
github.com/karpathy/llama
2.c/blob/master/run.c#L452
… For tiktoken-style, the problem is the

→ View original post on X — @karpathy,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *