AI Dynamics

Global AI News Aggregator

TPU Dynamic Sizes Constraints and GPU Attention Efficiency Trade-offs

On TPUs you can't use dynamic sizes in a loop, so you use mask and static sizes, and therefore the first step of the loop is as costly as the last one. On GPUs you could have a faster first step, but the cost of attention reduction is relatively low for large models.

→ View original post on X — @arthurmensch,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *