Define reward functions In GRPO we use deterministic functions to validate the response and assign a reward. No manual labelling required! The reward functions: – Match format exactly
– Match format approximately
– Check the answer
– Check numbers Check this out
GRPO Reward Functions: Format Matching and Answer Validation
By
–
