Check opt-out themselves! LAION relies on the fact that CommonCrawl allegedly checked opt-out, but the legal defense "our upstream provider is compliant so we don't need to be" isn't strong. Instead, LAION could have checked robots.txt themselves when creating the dataset.
LAION Dataset Compliance: Robots.txt Verification Ethics
By
–