AI Dynamics

Global AI News Aggregator

GPT-2 Reproduction with Increased Channel Size and Memory Optimization

We want to do a full GPT-2 repro, at channel size 1600 this is 2.1X higher C. And we'll want to ~max out batch dim to fit in memory too. So the "easy times" will be over soon.

→ View original post on X — @karpathy,

Commentaires

Leave a Reply

Your email address will not be published. Required fields are marked *