Great post! Just a clarification, was Flan-T5 further finetuned on any data or was it both few/zero shot for gpt3.5 and Flan-T5?
@yitayml
-
Few Organizations Can Train 100B+ Parameter Models Estimate
By
–
Definitely way less than 200. A wide spectrum on what it means to "train 100B+ parameter models". But I would estimate this number to be <50 optimistically.
-
Inductive Bias and Data Shape Emergence of AI Abilities
By
–
Inductive bias, data and other changes does influence the point where emergent abilities emerge. We showed this in UL2R paper: https://
arxiv.org/abs/2210.11399 I also wrote a blogpost: -
Transformer Paper Published in ACM Template Format Questioned
By
–
Looks cool but fundamental transformer modeling paper in ACM template though… Why??
-
Professional Departure: Best Wishes for Next Chapter
By
–
Sad to see you go HW! All the best for your next chapter!