AI Dynamics

Global AI News Aggregator

About

Tele-FLM-1T: Progressive Training and Depth Growth Techniques

Here are the juicy lessons from the 1T model Tele-FLM-1T: – Trained progressively in 3 stages: 52B -> 102B -> 1T parameters on ~2T tokens – Depth growth technique: selects layers to duplicate based on their input-output distance metrics, prioritizing middle layers with smaller

→ View original post on X — @maximelabonne