AI Dynamics

Global AI News Aggregator

Gradient Backpropagation and Frequent Word Prioritization in Training

Because we sum, the gradients for each token get backpropagated to all the rows that were used for it. So with enough training, the model ends up at a good compromise. Frequent words are naturally prioritised in this, because they'll simply have more gradients.

→ View original post on X — @honnibal,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *