AI Dynamics

Global AI News Aggregator

About

ConvexTok: Linear Programming for Optimal LLM Tokenization

"Tokenisation via Convex Relaxations" Most LLM tokenizers still use BPE, a greedy merge algorithm that can waste vocab slots on locally good but globally suboptimal tokens. This paper turns tokenizer training into a linear program, then rounds the solution into ConvexTok. This

→ View original post on X — @askalphaxiv,