This is almost what we do in spaCy — it's just that, instead of reserving the first 9900 rows for common words, we just use the hashing. Given that the assignment of words to shared rows is arbitrary, how can the model learn good representations?
spaCy’s Hashing Approach for Word Representations Learning
By
–
Leave a Reply