3/5 An example: In instance psf__requests-1724, the gold fix is 2 lines. Our agent’s functional fix was 8 lines. The LLM judge rejected the correct 8-liner as "messy" and "redundant," choosing a clean but **non-functional** fix instead. See full patch in the blog:
LLM Judge Rejects Functional Fix for Code Aesthetics
By
–
Leave a Reply