The flakiness is fundamental, which is one reason supervised models will be better for production where possible imo. I think there needs to be good testing in place and potentially resampling the LLM if there's a parse failure. We're planning stuff to make that easier
LLM Flakiness and Need for Supervised Models in Production
By
–
Leave a Reply