I think you're describing @OpenPipeAI
. Check out their work on RULER (
https://
openpipe.ai/blog/ruler?ref
resh=1754513766765
…), it's essentially a general purpose reward function. Might want to chat with them!
RULER: General Purpose Reward Function by OpenPipe AI
By
–