so many problems i don't know where to begin.
– yea put sparse and dense models in the same plot with the # params. good job – i'm sure you know the size of palm-2 and gpt-4. – fwiw, t5 is still one of the best LM models out there. it started way earlier than 2021. –
Comparing Language Models: Sparse, Dense, and Parameter Scaling Issues
By
–
Leave a Reply