You can align reasoning LLMs with just 1K data now! UC Santa Cruz released STAR-1, showing that fine-tuning Large Reasoning Models with it boosts safety performance by 40% on average—while barely affecting reasoning ability.
STAR-1 Boosts Safety in Reasoning LLMs with Minimal Data
By
–
Leave a Reply