AI Dynamics

Global AI News Aggregator

About

Evaluating Base Models: Math, Code, and Reasoning Performance

what are some good evals for base models? will run! (FYI i expect it to not perform _that_ well since the model card emphasizes they really only pretrained gpt-oss to be good at math/code/reasoning. also my basemodelrecovery process is likely somewhat lossy.)

→ View original post on X — @jxmnop