@askalphaxiv - AI Dynamics

Fixed Point Reasoners for Stable and Adaptive Looped Transformers

By

–

21 June 2026 16h56

« Fixed Point Reasoners: Stable and Adaptive Deep Looped Transformers » While Looped Transformers can devote more depth to harder problems, they still need a good method to know when to stop. This article makes of

→ View original post on X — @askalphaxiv

21 June 2026

Do as I Do: Dexerous Manipulation Data from Everyday Human Videos

By

@askalphaxiv

–

21 June 2026 10h07

“Do as I Do: Dexterous Manipulation Data from Everyday Human Videos” With how robot dexterity is bottlenecked by data as teleoperation and MoCap are expensive and internet videos are only observational, this paper turns normal RGB human videos into executable robot hand

→ View original post on X — @askalphaxiv

21 June 2026

Research on anthropomorphic misalignment requires stronger evidence

By

@askalphaxiv

–

19 June 2026 18h48

« Position: Research on anthropomorphic misalignment needs stronger evidence » AI safety papers often claim that models are deceptive, calculating, or concerned with self-preservation, but many results do not show

→ View original post on X — @askalphaxiv

19 June 2026

Q-Learning by Inversion for Flow Policies in Offline RL

By

@askalphaxiv

–

19 June 2026 18h47

Q-Learning by Inversion. Flow policies should be excellent for offline RL, but their iterative generation of actions makes RL training painful. This paper transforms each flow step into an RL step, then uses inversion to recover

→ View original post on X — @askalphaxiv

19 June 2026

Making Depth Reusable by Looping World Models

By

@askalphaxiv

–

18 June 2026 17h31

“World Models in a Loop” World models require deep computations for stable long-horizon simulation, but deeper models are costly and errors accumulate over deployments. This paper makes depth reusable by looping

→ View original post on X — @askalphaxiv

18 June 2026

Research agents excel at resolving implementation issues, not novel discoveries

By

@askalphaxiv

–

18 June 2026 15h02

1/N Model capabilities today are subpar for end-to-end autonomous research, what we would categorize as the ability to make “novel discoveries”. However, in our experimentation, research agents have proven to be an excellent way to resolve implementation issues and carry out

→ View original post on X — @askalphaxiv

18 June 2026

Presentation of auto-search for arXiv articles via autoarxiv

By

@askalphaxiv

–

18 June 2026 15h02

Introducing autoresearch for arXiv papers

Change 'arxiv' to 'autoarxiv' in any paper URL

An agent deploys to resolve setup issues on the codebase, run a minimal reproduction, and estimate full replication cost. Read more below pic.twitter.com/2UHVqbT7Eu
— alphaXiv (@askalphaxiv) 18 juin 2026

Presentation of auto-search for arXiv articles. Replace 'arxiv' with 'autoarxiv' in any article URL. An agent deploys to resolve configuration problems in the codebase, run a minimal reproduction, and estimate the replication cost.

→ View original post on X — @askalphaxiv

18 June 2026

Variable-Width Transformers: Wide at Start and End, Narrow in the Middle

By

@askalphaxiv

–

18 June 2026 5h43

This article shows that width should be allocated unevenly, with models wide at the beginning and end but narrow in the middle. Thus, the bottleneck forces better use of representations instead of wasting the

→ View original post on X — @askalphaxiv

18 June 2026

Distribution of thought paths in a continuous latent space

By

@askalphaxiv

–

17 June 2026 19h43

Latent Thought Flow This article moves reasoning into a continuous latent space, but instead of learning a single hidden thought path, it learns a distribution over many paths. Using a continuous GFlowNet, Latent Thought Flow assigns a

→ View original post on X — @askalphaxiv

17 June 2026

ExpRL uses reference solutions as reward scaffolds for exploratory RL

By

@askalphaxiv

–

17 June 2026 19h43

“ExpRL: Exploratory RL for LLM Mid-Training” Sparse reward RL works only when the base model can already find useful reasoning paths, but on hard problems it often gets no signal. This paper uses reference solutions as reward scaffolds instead of imitation targets, letting an

→ View original post on X — @askalphaxiv

17 June 2026