what are some good evals for base models? will run! (FYI i expect it to not perform _that_ well since the model card emphasizes they really only pretrained gpt-oss to be good at math/code/reasoning. also my basemodelrecovery process is likely somewhat lossy.)
Evaluating Base Models: Math, Code, and Reasoning Performance
By
–