ICYMI – Andrej Karpathy has released an excellent video tutorial on "Tokenization" couple of days back. ⦿ Basics covered: Strings, Unicode code points, and encodings like UTF-8.
⦿ Byte pair encoding algorithm explained and implemented in Python.
⦿ Delving into complexities:
Karpathy releases tokenization tutorial covering UTF-8 and byte pair encoding
By
–
Leave a Reply