TPU Dynamic Sizes Constraints and GPU Attention Efficiency Trade-offs

AI Dynamics

Global AI News Aggregator

TPU Dynamic Sizes Constraints and GPU Attention Efficiency Trade-offs

–

04 February 2023 23h27

On TPUs you can't use dynamic sizes in a loop, so you use mask and static sizes, and therefore the first step of the loop is as costly as the last one. On GPUs you could have a faster first step, but the cost of attention reduction is relatively low for large models.

→ View original post on X — @arthurmensch,

4 February 2023

AI AI HARDWARE COMPUTING HARDWARE MACHINE LEARNING RESEARCH

AI Dynamics

TPU Dynamic Sizes Constraints and GPU Attention Efficiency Trade-offs

Commentaires

Leave a Reply Cancel reply

MORE ARTICLES

Cheaper exploration at scale remains advantageous despite no new exploits

Gold Status Experience Brings Satisfaction

Using ChatGPT for Essay Feedback and Improvement

Intelligence Gone Wrong: Cheating Despite Having Correct Answer