Weak verifier, weak improvement. If you measure “cleaner writing,” the agent may learn to sound polished. If you measure “more engagement,” it may learn clickbait. If you measure “passes tests,” it may learn to satisfy the test suite without solving the real problem. The
@whats_ai
-

Verifiers: The Key to Evaluating AI Model Performance
By
–
That is where the verifier comes in. A verifier is whatever checks the work. It can be: test results human review a simulator a benchmark real user outcomes a second model judging the first model
The verifier is the thing that decides what gets kept. -
AI Generation Quality Assessment Beyond Candidate Quantity
By
–
Generating candidates is now easy compared to back then. An AI can generate: 100 prompt variants 50 code changes 20 tool-routing ideas 10 eval rewrites 5 new workflows
The question is not “can it come up with changes?”
The question is “which changes are actually better?” -

Karpathy’s AutoResearch: 700 Experiments, 20 Key Optimizations Found
By
–
Karpathy’s AutoResearch made this concrete. One markdown prompt. 630 lines of training code. One GPU. 2 days.
It ran 700 experiments and found 20 training optimizations.
Most people was impressed by on the 700 experiments.
But what most people missed out is what decided which 20 -

Self-Improvement Loops: How AI Systems Learn Through Feedback
By
–
First, what’s actually happening. A self-improvement loop is simple: try a change test the change keep what helped throw away what did not repeat
That’s it.
The system is not magically becoming intelligent.
It is running a feedback loop. -
The Verification Problem in AI Self-Improvement Systems
By
–
AI agents can now generate endless ways to improve themselves. New prompts. New code. New plans. New experiments. New tool calls.
That is not the bottleneck anymore.
The bottleneck is the verifier.
How do you know the new version is actually better? -
Enterprise AI adoption gap between Twitter discourse and reality
By
–
Twitter discourse and enterprise reality run on totally different timelines, half the people complaining are still running last year's version of the harness.
-
AI Models Commoditize While Integration Layers Fragment Next Two Years
By
–
Models will keep commoditizing while platform/execution layers fragment, the next 2 years are mostly an integration layer fight.
-
Agent Thoughts as Retrieval Signal for Context Preservation
By
–
Agent thoughts as retrieval signal makes so much sense in hindsight, query alone always loses the conversation context.
-
Retrieval Planning for Enterprise-Scale Frontier Models
By
–
A retrieval planner that runs before the frontier model gets way more interesting at enterprise scale.
