What happens if you put a full scientific paper into AI and ask it to find known errors in proofs, tables, etc? Every model before o3 fails completely, o3 gets 21% (its better at proofs, worse at tables & figures). Progress & perhaps a second opinion, not yet autonomous science.
O3 Model Tests Scientific Paper Error Detection at 21% Accuracy
By
–
Leave a Reply