AI Dynamics

Global AI News Aggregator

About

Tokenizer Training and Data Filtering Compliance Standards

Well, the tokenizer I used was trained on large quantities of data — I filtered the tokens based on yet more data from FineWeb. Question is if that's acceptable according to your rules…

→ View original post on X — @alexjc,