python is great also i love swe bench btw just might not be the highest signal point of comparison between Claude and chatGPT these days
@jxmnop
-
SWEbench Scores Inflated by Django Training Data Bias
By
–
good time to remind everyone that a high score on SWEbench really just indicates the training data contained a sufficiently large proportion of Django
-

GPT-5 Long-Context Training Methods and Capabilities
By
–
most impressive part of GPT-5 is the jump in long-context how do you even do this? produce some strange long range synthetic data? scan lots of books?
-
Business Success and Failure Predictability Patterns
By
–
yeah that’s right. just seems so obvious that certain businesses are going to fail (or succeed) sometimes.
-
VC Funding Waste: Why AI Startups Burn Millions Without Impact
By
–
the VC world provides a lot of value but sometimes it feels like they just set money on fire. several startups i know raised ~100M total three years ago to make AI, built software nobody ever used, and now they all work elsewhere on unrelated things. where'd all that money go?
-
Author-Reviewer Effort Asymmetry in Academic Peer Review
By
–
it seems usually the other way around • authors spend incredible amounts of time optimizing rebuttal responses
• reviewers spend little time reading (sometimes just click the button) (some reviewers are good though) -
NeurIPS Rebuttals: Researchers Share Frustrations During Review Season
By
–
strange in the social media era to feel that everyone is working on NeurIPS reviews & rebuttals rn but no one is talking about it publicly. i contributed to three rebuttals. one of the three was extremely frustrating. who else is working on rebuttals? how's it going for you?
-
Sam Altman’s AI Model Ambitions Lack Self-Awareness
By
–
sam altman wants to build a model that’s rated 3200 on codeforces but has no idea who Sam Altman is and i think that’s beautiful
-
GPT-OSS Model Performance: Coding Excellence Mixed with Factual Hallucinations
By
–
i’ve spent the last couple hours talking to gpt-oss and can safely say it’s unlike any model i’ve tested one second it’s coding for me at a professional level, the next it’s making up basic facts and clinging to them no matter what i say something very strange is going on
-
Choosing Your Own Reward Function in AI Systems
By
–
at least i get to choose my own reward function