llm.c Optimization: Matching PyTorch Performance After Bug Fix

AI Dynamics

Global AI News Aggregator

llm.c Optimization: Matching PyTorch Performance After Bug Fix

–

14 April 2024 0h15

Highly amusing update, ~18 hours later: llm.c is now down to 26.2ms/iteration, exactly matching PyTorch (tf32 forward pass). We discovered a bug where we incorrectly called cuBLAS in fp32 mathmode . And ademeure contributed a more optimized softmax kernel for very long rows

→ View original post on X — @karpathy,

14 April 2024

AI CODE COMPUTING LLMS MACHINE LEARNING OPEN SOURCE SOFTWARE

AI Dynamics

llm.c Optimization: Matching PyTorch Performance After Bug Fix

Commentaires

Leave a Reply Cancel reply

MORE ARTICLES

Cheaper exploration at scale remains advantageous despite no new exploits

Gold Status Experience Brings Satisfaction

Using ChatGPT for Essay Feedback and Improvement

Intelligence Gone Wrong: Cheating Despite Having Correct Answer