I feel like model training on languages where the individual characters have meaning (eg chinese) would have different properties than on languages with purely phonetic alphabet, but i’m not sure what
Language Character Systems Impact Model Training Properties
By
–