Many LLMs have already been RLHFed and finetuned into activities other than "predict the next token a human would write". This being the case, how would you tell if the output was being driven by some extra force above that and all the finetuning?
Detecting Additional Forces Beyond Token Prediction in Fine-tuned LLMs
By
–
Leave a Reply