Context: While my LLMs from Scratch book explains and uses BPE tokenizers, I opted to use the highly performant tiktoken library (which is used for GPT-4 and now also used for Llama 3 as well) for practical purposes: the book focuses on LLMs rather than tokenizer development.
LLMs from Scratch book uses tiktoken over BPE for practical efficiency
By
–
Leave a Reply