this all started from a quest to come up with a proper measurement of model memorization it's hard to compute *per-example* memorization, because models "share" info between datapoints so we start with random uniform strings, where sharing isn't possible. and we get this:
