Possibly yes. More broadly, if we're going to be doing more large training runs at all, I'd guess we should start filtering the datasets soon (presumably using a previous-generation LLM finetuned for that?). It's hard to finetune out a cognition once it's learned by the base.
Dataset Filtering for Large Language Model Training Runs
By
–
Leave a Reply