Here's the breakthrough nobody expected: They introduced IPA (Interaction-Perceptive Agentic Policy Optimization) that assigns credit at the CHUNK level, not token level. Tokens are too fine. Trajectories are too coarse. Chunks align with actual tool-use semantics.
IPA: Chunk-Level Credit Assignment Breakthrough
By
–
