i might have heard the same — I guess info like this is passed around but no one wants to say it out loud.
GPT-4: 8 x 220B experts trained with different data/task distributions and 16-iter inference.
Glad that Geohot said it out loud. Though, at this point, GPT-4 is
GPT-4 Architecture: 8 Experts with 16-Iteration Inference
By
–
Leave a Reply