RL at today’s frontier doesn’t seem to wreck monitorability and can help early reasoning steps. But there’s a tradeoff: smaller models run with higher reasoning effort can be easier to monitor at similar capability — at the cost of extra inference compute (a “monitorability
RL Reasoning Tradeoff: Monitorability vs Inference Compute
By
–