AI Dynamics

Global AI News Aggregator

About

RiskPO: Risk-Based Policy Optimization for LLM Post-Training

#PapersAccepted by Jiqizhixin
Our report: https://
mp.weixin.qq.com/s/9TbUIT6ed_wO
viVU0GuLqg
… RiskPO: Risk-based Policy Optimization via Verifiable Reward for LLM Post-Training Peking University
Paper: https://
arxiv.org/abs/2510.00911
v1

Code: https://
github.com/RTkenny/RiskPO

→ View original post on X — @jiqizhixin