Ran autoresearch on hf to see whether anything can beat MuonAdamW baseline Biggest takeaway: NS orthogonalization is a very strong attractor that absorbs most gradient modifications you throw at it. See all the artifacts at huggingface.co/datasets/mish…
→ View original post on X — @clementdelangue, 2026-04-10 13:23 UTC

Leave a Reply