It does that because all of its training data in the last, post-training stage are of the form [question -> authoritative sounding solution], where the solutions are written by humans. The LLMs just imitate the form/style of that training data.
LLMs Imitate Training Data Format During Post-Training Stage
By
–
Leave a Reply