I'd love to be able to buy a one-off comparative prompt execution in a few months just to see how a new benchmark has evolved over time
Comparative prompt execution tracking for benchmark evolution
By
–
By
–
I'd love to be able to buy a one-off comparative prompt execution in a few months just to see how a new benchmark has evolved over time