2026 is the year of continual learning And we are getting some amazing papers towards that This paper introduces Self-Distillation Fine-Tuning (SDFT): on-policy continual learning from expert demonstrations, with no explicit reward inference or engineering The trick here is:
Self-Distillation Fine-Tuning: On-Policy Continual Learning from Expert Demonstrations
By
–
