AI Dynamics

Global AI News Aggregator

About

WebSailor paper uses agentic RL post-training to boost Deep Research scores

Recent WebSailor paper by Alibaba-NLP, shows how to post-train models for Deep Research – good insights in there, about creating a dataset then training recipe. I particularly like how the agentic RL at the end of post-training improves scores by ~4 p.p. across the board: RL +

→ View original post on X — @aymericroucher