@jxmnop

–

12 August 2025 21h16

"i thought the transformers gpt-oss MoE finetuning was broken, how did you get it working?"

Developer Creates Triton Kernel After OpenAI Frustration

By

–

12 August 2025 16h07

new blog! i got mad at openai and vibe-coded a triton kernel

RL Scaling Progress: Engineering Breakthrough in Model Training

By

–

12 August 2025 14h55

i am GIDDILY following the open progress in RL scaling notable that it takes such a herculean engineering effort to train a model for 3000 steps. but these results are undeniable, esp with shortened context also it's only been two months since v1… hard to keep up…

Developer Reverses GPT-OSS Reinforcement Learning, Releases Base Model

By

–

12 August 2025 4h52

figured out how to "undo" the RL and turn gpt-oss back into a base model will drop the weights tomorrow gn

TorchTitan MoE Implementation with FSDP and Torch Requirements

By

–

12 August 2025 4h32

torchtitan has an MoE impl that supports grouped mm and composes with FSDP: https://
github.com/pytorch/torcht
itan/blob/main/torchtitan/models/moe.py
… needs the latest torch version though (2.8) which flash-attn doesnt have a wheel for yet 🙁

METR and GPT-5 Impress with Latest AI Advancements

By

–

11 August 2025 21h19

thank you! after looking more into this i've been incredibly impressed, both by METR and GPT-5 haha

11 August 2025

GPT-5 Task Completion Claims: Skepticism on Four-Hour Task Automation

By

–

11 August 2025 2h59

what are some examples of the tasks behind this chart? it says GPT-5 can complete 50% of tasks that take four hours i dont think anyone truly believes this