This Post investigation underscores why studying training data is so important. Even post-filtering, Google's widely used C4 dataset is riddled with white supremacist, anti-trans, pro-Jan 6 riots and QAnon pizzagate content. That's just for starters.
.
Google’s C4 Dataset Contains Harmful Extremist Content Despite Filtering
By
–
Leave a Reply