The pre-training stage is where the billions of words of training data come into play – the instruction-tuning / RLHF stage is where human labelers are asked to vote on which generations are "best" – that's the bit that might influence things like "delve" https://
openai.com/research/instr
uction-following
…
Pre-training and Instruction-Tuning in Large Language Models
By
–
Leave a Reply