AI Dynamics

Global AI News Aggregator

About

BPE token vocabulary depends on frequency in pre-train corpus

That’s right — the token vocabulary is determined using BPE so long strings only get their own tokens when they occur often in the pre-train corpus.

→ View original post on X — @goodside