Constraints really are doing the inventing now, all the interesting attention work is coming from GPU-poor labs.
@whats_ai
-
METR Evals: Best Public Signal for AI Capability Progress
By
–
METR's evals are honestly the best public signal we have for actual capability gain right now, glad they're getting more airtime.
-
Inference Latency Optimization Gains Production Advantage
By
–
Wider also helps inference latency, this could end up more interesting for production than the CE win itself.
-
Agent-First Benchmarks: Evaluating Variable Step Protocols
By
–
Agent-first benchmarks are something! Curious what the protocol looks like for agents that take wildly different numbers of steps.
-
Compute Wars Intensifying: A Critical Moment in AI Hardware
By
–
Compute wars intensifying. What a time to be alive haha.
-

Anthropic LLM Honesty Constraints Frustrate User Interaction
By
–
This is probably the most frustrating (repeated) interaction I've had with LLMs ever since ChatGPT. I truly hope Anthropic still scan for "ffs, F**KING" to work on these as a priority. I've never thought I'd say that, but, in this case, I hate "honesty", it seems. I run the
-

Testing ChatGPT’s Running Training Plan With Real Data
By
–
I asked ChatGPT to build me a running training plan instead of hiring a trainer. I wanted to test one thing. If I give ChatGPT all my data (Strava history, PRs, volume…), does the plan it builds actually work? Will I see real results? Here's the full story. I'm giving a
-
Canada-Germany AI Sovereignty: Data Residency Enterprise Adoption
By
–
Canada-Germany sovereign AI axis is overdue, the European data residency angle has been a real bottleneck for enterprise adoption.
-
Layoff Data Contradicts Tech Industry Hot Takes
By
–
Glad to see the data not matching the layoff hot takes for once
-
Codex vs gpt-image-2: Quality Differences in Image Generation
By
–
Useful confirmation. Did you find any quality differences between Codex's included image gen and the standalone gpt-image-2 API?