Another paper pointing out the inadequacy of older public benchmarks for determining whether AI is actually good for tasks like medicine. Models are clearly memorizing or using heuristics for some answers. A new wave of benchmarks based on real-world data will help, more needed.
Medical AI Benchmarks: Beyond Memorization to Real-World Performance
By
–
Leave a Reply