AI Dynamics

Global AI News Aggregator

About

RoPE Scaling Enables 16K Token Context Window Models

Yeah — simply multiplied rope * 2, trained on long data with a MSL of 16K.

→ View original post on X — @mattshumer_