If not careful, fine-tuning collapses entropy relatively arbitrarily, creates miscalibrations, e.g. see Figure 8 from GPT-4 report on MMLU. i.e., if a model gives probability 50% to a class, it is not correct 50% of the time; its confidence isn't calibrated.
Fine-tuning Miscalibration: Model Confidence Doesn’t Match Accuracy
By
–
Leave a Reply