sry thanks for following up! for classification: id do accuracy, that should be pretty straightforward. if that doesnt work would be curious to hear more! second is harder, but id use LLM-as-a-judge. you can define either a ground truth example to compare with, or just
LLM Classification Methods: Accuracy and Judge-Based Evaluation
By
–
Leave a Reply