Use a database storage method not version control. Storing the LAION-5B dataset on revision control means nothing can be removed (e.g. private data or illegal content) without leaving a trace or nuking history. Using a database storage + wrapper API would avoid this problem.
Database Storage for LAION-5B: Removing Illegal Content Safely
By
–