At production input lengths, the encoder cuts p50 latency by roughly 5× vs. HuggingFace tokenizers, 2× vs. SentencePiece C++, and 1.5× vs. IREE C. At 514 tokens, it runs in 63 µs with zero heap allocations.
AI Encoder Tokenizer Performance: 5× Latency Improvement
By
–
