The LLM was then fine-tuned using these self-generated solutions as target outputs, which resulted in improved general reasoning ability of a 540B-parameter LLM across multiple benchmarks without requiring any ground truth labels /5
LLM Fine-Tuning with Self-Generated Solutions Improves Reasoning
By
–
Leave a Reply