Yo! @allen_ai just dropped OLMo 2 Tech report, some interesting things I found Architecture
> Reordered norm: Normalizing outputs of attention and feedforward layers within transformer blocks instead of inputs
> QK-norm: Normalizing key and query projections with RMSNorm
OLMo 2 introduces reordered norm and QK-norm innovations
By
–
