This one has just been a megaprompt with tons of riffing/experimenting and working with a team of human evaluators on the test cases. It's not the kind of thing that can be accurately evaluated by LLMs or programmatically, so it's been fairly labor intensive to get right. Also,
Megaprompt Development Through Human Evaluation and Experimentation
By
–
Leave a Reply