AI Dynamics

Global AI News Aggregator

Optimizing minGPT: Performance improvements from 495ms to 102ms

having fun optimizing minGPT today
– base: 495ms
– zero_grad(set_to_none=True): 492
– torch.jit.script gelu: 463
– OMP_PROC_BIND=CLOSE: 453
– torch.backends.cuda.matmul.allow_tf32: 143
– torch.autocast(torch.bfloat16): 121
– FlashAttention: 102
now: more fused kernels more better

→ View original post on X — @karpathy,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *