AI Dynamics

Global AI News Aggregator

About

MoE Architecture Changes: Softmax to Sigmoid Gate Function

Dug a bit more in to the modelling code (v2 vs v3), here are the key changes: > MoE gate function changed from softmax (v2) → sigmoid (v3)
> New Top-k Selection method `noaux_tc`
> Added e_score_correction_bias for better expert selection or even training

→ View original post on X — @reach_vb