AI Dynamics

Global AI News Aggregator

About

GPT-4 Falcon Llama2 Tokenizer Comparison Study Results

And our answers are out! Running on 1B tokens from the web (filtered and mostly in English as details in https://
huggingface.co/papers/2306.01
116
…) we got – GPT4 tokenizer (100k vocab) gives you 0.997B tokens – Falcon tokenizer (64k vocab) gives you ~5% more tokens (1.04B)
– Llama2 tokenizer

→ View original post on X — @thom_wolf