If you look carefully, you can spot the exact times when Mythos was breaking out of its sandbox
@petergostev
-
Delayed Message Feature Request for Background Job Notifications
By
–
Codex feature request: 'delayed message'; I often have long running jobs in the background and I want to send a 'how is it going?' message in, say 20 mins or an hour. I don't want to create an automation for this, just a delay is enough. Would be a nice quality of life feature
-
Arena Rankings Leaked: Three Years of Leaderboard Data Exposed
By
–
Someone LEAKED Arena's full rankings going back 3 years across all leaderboards, these security incidents are getting out of control
-
New Models Added to BullshitBench: Qwen Performance Analysis
By
–
I did a big clean up of some new models to add to the BullshitBench – none of them are particularly interesting tbh. Qwen scored relatively well, but below Qwen 3.5
-
Data Viewer Tool for AI Model Performance Benchmarking
By
–
Data viewer: https://
petergpt.github.io/bullshit-bench
mark/viewer/index.v2.html
… GitHub: -
Mythos Model Card: 244 Pages of Detailed AI Documentation
By
–
What's interesting about Mythos is that they have a 244 page model card that is very detailed and thorough. Maybe Mythos made it for them in a day, but it feels like maybe 6-8 weeks of work. So it was probably ready for c.2 months. Wonder if they used it for coding already.
-
Interactive Website for Compute Wars Competition Platform
By
–
Interactive website: https://
compute-wars.surge.sh
Github: -
Anthropic Google Deal Narrows OpenAI AI Capacity Lead
By
–