it's not really doing the same thing, they are doing encoding only while the tokenizer lib does a lot of work keeping track of offset between the original string and the final tokens, truncating, etc. this is needed to support various model hub task like QA, etc
Tokenizer Libraries Essential Features for NLP Tasks
By
–
Leave a Reply