llm.c Day 24: Multi-GPU Training in C/CUDA Outperforms PyTorch

AI Dynamics

Global AI News Aggregator

llm.c Day 24: Multi-GPU Training in C/CUDA Outperforms PyTorch

–

03 May 2024 20h22

Day 24 of llm.c: we now do multi-GPU training, in bfloat16, with flash attention, directly in ~3000 lines of C/CUDA, and it is FAST! We're running ~7% faster than PyTorch nightly, with no asterisks, i.e. this baseline includes all modern & standard bells-and-whistles: mixed

→ View original post on X — @karpathy,

3 May 2024

AI AI HARDWARE CODE COMPUTING INNOVATION LLMS MACHINE LEARNING OPEN SOURCE RESEARCH

AI Dynamics

llm.c Day 24: Multi-GPU Training in C/CUDA Outperforms PyTorch

Commentaires

Leave a Reply Cancel reply

MORE ARTICLES

AI Generates Perfect Jokes Using Image Generation Skills

Codex App Transformation: Atlas Integration Reshapes User Experience

AI File Access Limitations: Screenshot vs Disk Storage Issues

Synthetic Aperture Radar: Satellite Tech for Global Monitoring