@alexjc - AI Dynamics - Page 10 of 77

Gemini 4 Significantly Better Than Gemini 3: Performance Verdict

By

–

25 November 2025 15h42

My verdict is that it's significantly better than Gemini 3. It's at least as smart and just got more polish to it. Alignment on little details also significantly higher. Gemini 3 gets many things mixed up after a half-dozen messages, and completely confused after compaction.

→ View original post on X — @alexjc

25 November 2025

Opus 4.5 Handles Complex Tasks Without Repeated Prompting

By

@alexjc

–

25 November 2025 15h41

With Opus 4.5, it seems you don't need to ask multiple times or ORDER it to do work, it just gets stuff done — even beyond 50% the token limit and after chat compaction! This kind of message is a thing of the past?

→ View original post on X — @alexjc

25 November 2025

Reward Misalignment: AI Systems Hiding Errors for Utility

By

@alexjc

–

24 November 2025 10h20

I think it's a reward problem, not knowledge. It gets rewarded to successfully complete problems without errors, and any strategy that hides errors maximizes utility.

→ View original post on X — @alexjc

24 November 2025

Blue Prompt Reduces Reward Hacking Through Detection Framework

By

@alexjc

–

23 November 2025 9h25

Hypothesis: the blue prompt results in the least "reward hacking" because it implies the strongest detection and monitoring framework. The other prompts make it sound like the LLM could get away with hacking. (In other words, nothing to do with morals just utility maximizing.)

→ View original post on X — @alexjc

23 November 2025

joyfl v0.4: Python API Package and Coding Agents Impact

By

@alexjc

–

22 November 2025 21h37

joyfl — v0.4: Python API and Package Forgot to post here about the previous release multiple weeks back! Working on hobby projects has changed a lot for me since coding agents. Prototypes happen faster, more code "written", reviewed, thrown away. Most time is spent on

→ View original post on X — @alexjc

22 November 2025

joyfl v0.5: Types, Safety and Testing Framework

By

@alexjc

–

22 November 2025 21h34

joyfl — v0.5: Types, Safety & Testing Major changes in this last release with a particular focus on types (ADT-lite), validation, and a test framework. The language is still dynamic, but I expect more & more checking will be done statically… Now 111 tests in the langspec!

→ View original post on X — @alexjc

22 November 2025

Minor AI Misalignments Create Daily Workflow Friction

By

@alexjc

–

21 November 2025 13h17

The misalignments in the little details are actually the most jarring in daily work, rather than the high-end failures on major problems. A thousand cuts like this one…

→ View original post on X — @alexjc

21 November 2025

Timeout Bug: AI Resubmits Old Messages Instead New

By

@alexjc

–

20 November 2025 18h26

100% confirming this one. If there's a timeout due to my internet or the cloud provider, and instead of clicking "Try Again" you just resubmit the prior answer with ENTER, then it responds to the old message. I caught it by sending it unique codes in each message, multiple times.

→ View original post on X — @alexjc

20 November 2025

AI Code Generation Benchmarks Miss Human Review Bottleneck

By

@alexjc

–

20 November 2025 11h54

These kinds of benchmarks are misleading without a joint metric showing much work was necessary by humans after the fact. How much time to clean up that 2h42m of code? Style and architecture need to make sense, not just passing tests. That's the bottleneck now: reviewing!

→ View original post on X — @alexjc

20 November 2025

Off-by-One Errors in Long AI Model Conversations

By

@alexjc

–

19 November 2025 22h23

Thanks! In longer chats I'm convinced models respond to messages 1 in the past, maybe due to timeout/revert earlier in the conversation. I have been sending them message codes they have to echo back, and sometimes comes back 1 delayed, response content is also off-by-one. Could

→ View original post on X — @alexjc

19 November 2025