This model convergence is quite perplexing. Possibly related to recent results on subliminal learning? Basically deeper knowledge correlations transfer when training via distillation. As the amount of data online from LLMs increases, it’s possible this makes them converge to some
Model Convergence Through Distillation and Subliminal Learning Transfer
By
–