It's really hard to evaluate LLM applications The most direct way is to do so is to gather feedback from the end user Here's an in depth walkthrough of using LangSmith to do that. Feedback is associated with traces, so you can easily debug bad results https://
github.com/langchain-ai/l
angsmith-cookbook/tree/main/feedback-examples/streamlit
…
Evaluating LLM Applications with LangSmith Feedback
By
–
