AI Dynamics

Global AI News Aggregator

QK Norm and Embedding Weight Sharing in Model Optimization

Very early on in the project I did a small run with/without QK norm and found that it helped. Same for the embedding weight sharing. I'll retry! I'm not tied to any details of the model and they weren't chosen any more carefully than a single run, I spent most of the time just

→ View original post on X — @karpathy,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *