AI Dynamics

Global AI News Aggregator

About

DistBelief: Distributed Training and Knowledge Distillation Framework

Or https://
arxiv.org/abs/1503.02531, which describes how distbelief was used to train the baseline model (a quite large model for the circa 2014/2015 time frame), and later describes using the framework to train specialists and then distill them into a single model.

→ View original post on X — @jeffdean,