A reader recently shared a resource on training tokenizers for new languages which reminded me I originally wrote a BPE Tokenizer for my “LLMs from Scratch” book but never shared it! If you are looking a weekend project, here you go:
BPE Tokenizer Implementation from LLMs from Scratch Book
By
–
Leave a Reply