what are some good evals for base models? will run! (FYI i expect it to not perform _that_ well since the model card emphasizes they really only pretrained gpt-oss to be good at math/code/reasoning. also my basemodelrecovery process is likely somewhat lossy.)
@jxmnop
-
Running AI Model Evaluations and Performance Benchmarks
By
–
can you give some examples? i bet it doesn't perform better– but i can run the evals!
-

Fixing Transformers GPT-OSS MoE Finetuning Issues
By
–
"i thought the transformers gpt-oss MoE finetuning was broken, how did you get it working?"
-

Developer Creates Triton Kernel After OpenAI Frustration
By
–
new blog! i got mad at openai and vibe-coded a triton kernel
-

RL Scaling Progress: Engineering Breakthrough in Model Training
By
–
i am GIDDILY following the open progress in RL scaling notable that it takes such a herculean engineering effort to train a model for 3000 steps. but these results are undeniable, esp with shortened context also it's only been two months since v1… hard to keep up…
-

Developer Reverses GPT-OSS Reinforcement Learning, Releases Base Model
By
–
figured out how to "undo" the RL and turn gpt-oss back into a base model will drop the weights tomorrow gn
-
TorchTitan MoE Implementation with FSDP and Torch Requirements
By
–
torchtitan has an MoE impl that supports grouped mm and composes with FSDP: https://
github.com/pytorch/torcht
itan/blob/main/torchtitan/models/moe.py
… needs the latest torch version though (2.8) which flash-attn doesnt have a wheel for yet 🙁 -
METR and GPT-5 Impress with Latest AI Advancements
By
–
thank you! after looking more into this i've been incredibly impressed, both by METR and GPT-5 haha
-

GPT-5 Task Completion Claims: Skepticism on Four-Hour Task Automation
By
–
what are some examples of the tasks behind this chart? it says GPT-5 can complete 50% of tasks that take four hours i dont think anyone truly believes this
-
Embedding, Clustering and Topic Modeling Workflow
By
–
actually much simpler – 1. embed data
2. cluster
3. run topic model on clusters courtesy of @nomic_ai !