For language modelling and translation, word pieces are the standard way to address this. But for smaller datasets, especially where the decisions need to be made on a per-token basis, it's very helpful to learn representations for words, rather than characters or word pieces.
Word representations learning for smaller datasets and per-token decisions
By
–
Leave a Reply