AI Dynamics

Global AI News Aggregator

About

MIT Improves Reasoning Model Calibration Through Reinforcement Learning

“Reinforcement Learning w/Calibration Rewards” What makes top reasoning models overconfident? MIT found that in these models, RL rewards correct answers, not certainty. Training models to estimate confidence improved calibration while maintaining accuracy:

→ View original post on X — @mit_csail,