AI Dynamics

Global AI News Aggregator

Weights Sharding Strategy: Full Replication Except Last Dimension

Also, tip on weights sharding: as a generic rule of thumb you can use full replication for all variable dimensions except the last one, which should be sharded across the "model" axis

→ View original post on X — @fchollet,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *