TIL: When distilling reasoning capability from a teacher LLM to a smaller LLM, you should use Agent traces instead of CoT traces. Advantages are:
1. Increased generalization
Intuitively, this is because your agent can encounter more "surprising" results by interacting with its
Agent traces better than CoT for distilling reasoning
By
–
