Fun fact: In my Build A Large Language Model From Scratch book, I reused the pre-training function for the supervised instruction fine-tuning chapter to show this as clearly/intuitively as possible. Only the dataset (structure) changes.
Pre-training and Fine-tuning Share Same Function in LLM Book
By
–
Leave a Reply