The expectation is replaced by an average over tokens (a few trillion for the largest LMs) so F can be very general. If a human is selecting the y’s among other y’s to create a dataset, then the human is F, hopefully being sensible
Data Selection and Distribution Functions in Large Language Models
By
–