atm we're doing init from gpt-2 weights and finetuning. this was very useful for debugging and when the code was slower. there is no code yet to init from scratch, so no code to warmup the lr etc. should be a very short addition though.
GPT-2 Weight Initialization and Fine-tuning Strategy
By
–
Leave a Reply