Nobody's been talking about it but it's rather *mind-blowing* imo that the open-source Flacon 40B model is topping LLaMa 65B on leaderboards and many evals while having required not even half the compute of LLaMa to train from scratch Quick back of the envelop calculations:
–
Falcon 40B Outperforms LLaMa 65B With Half Training Compute
By
–