It would be interesting if LLM training tracked the per-token loss back to the source material — it would be an objective measure of how much each specific book / document contributed to the training. Might say something useful for human learning!
Tracking Per-Token Loss to Measure Document Contribution in LLM Training
By
–
Leave a Reply