I also wonder if taking the mean of several embeddings makes them “out of distribution” somehow im really not sure, about the muster question! I convinced myself it’s because embeddings are sort of “compressed” and it takes a lot of effort to decompress them