Looking at the "Dataset Contamination" section in PaLM, it does not seem to make a big difference does it? Which makes sense, would be surprising if it does after data deduplication +single iteration over the training set (also our common crawl data is <= 2020)
Dataset Contamination Impact in PaLM Models Appears Minimal
By
–
Leave a Reply