Insightful thread by @hwchung27 that shares some behind the scenes of training flan-t5. These details are often overlooked and sometimes mentioned in footnotes in papers (or not mentioned at all, i.e., "not paper worthy"). Sometimes they are stack dependent, model dependent or
Behind-the-scenes training details of Flan-T5 model revealed
By
–
Leave a Reply