How are people testing? Testing LLM Apps is hard. How are people doing it? 1. We see that 83% of test runs have some form of feedback, suggesting that most people are finding some metrics to eval (rather than just eyeball) 2. We see an average of 2.3 feedback per run,
Testing LLM Applications: Metrics and Evaluation Methods
By
–
Leave a Reply