AI Dynamics

Global AI News Aggregator

About

Building Transformers: Self-Attention, Training, and GPT-3 Comparison

The second ~1hr builds up the Transformer: multi-headed self-attention, MLP, residual connections, layernorms. Then we train one and compare it to OpenAI's GPT-3 (spoiler: ours is around ~10K – 1M times smaller but the ~same neural net) and ChatGPT (i.e. ours is pretraining only)

→ View original post on X — @karpathy