Let's say we have 10k rows in the embedding table. In a standard embedding table, you give each of the 9999 most frequent words their own vector, and have all the others share the last vector.
By
–
Let's say we have 10k rows in the embedding table. In a standard embedding table, you give each of the 9999 most frequent words their own vector, and have all the others share the last vector.