@askalphaxiv - AI Dynamics

Qwen-VLA: Unified Vision-Language-Action Robot Learning

By

–

29 May 2026 19h58

“Qwen-VLA: Unifying VLA Modeling across Tasks, Environments, and Robot Embodiments” They turned robot learning into one vision-language-action modeling problem instead of separate policies for each task, environment, and robot body. So by adding a DiT flow-matching action

→ View original post on X — @askalphaxiv,

29 May 2026

Gemini Embedding 2: Native Multimodal Embedding Model

By

@askalphaxiv

–

29 May 2026 19h56

"Gemini Embedding 2" This paper turns Gemini into one native embedding model for text, image, video, audio, and interleaved multimodal inputs. Instead of converting everything into text first, it embeds raw modalities directly into one shared space, improving audio search,

→ View original post on X — @askalphaxiv,

29 May 2026

LeJEPA World Model Learning Under Gaussian Latent Dynamics

By

@askalphaxiv

–

29 May 2026 0h44

New paper from Yann LeCun! "When Does LeJEPA Learn a World Model?" This paper proves that under Gaussian latent dynamics, LeJEPA can recover the hidden state behind nonlinear observations up to rotation. The intuition is that linear latent features are the most stable across

→ View original post on X — @askalphaxiv,

29 May 2026

On-Policy Distillation: Emerging AI Post-Training Method

By

@askalphaxiv

–

27 May 2026 19h17

A new class of post-training method is emerging in 2026: On-Policy Distillation (OPD). It’s already showing up across frontier open-weight model releases, and it’s quickly becoming a technique worth understanding. To help you get up to speed, we’ve compiled a list of the most

→ View original post on X — @askalphaxiv,

27 May 2026

MiniMax-M2 Agent-Native RL Training Paper Released

By

@askalphaxiv

–

27 May 2026 5h49

MiniMax-M2 paper just dropped The key focus of M2 is on something more agent-native. It trains on runnable workspaces and artifact-grounded rewards, then uses Forge to scale RL over long coding, app, search, and office-task trajectories. What's interesting is that M2.7

→ View original post on X — @askalphaxiv,

27 May 2026

Looped Transformers: Frozen Checkpoint Inference Optimization

By

@askalphaxiv

–

27 May 2026 1h38

Another cool research on Looped Transformers They ask the question: "Can we loop a frozen, off-the-shelf checkpoint directly at inference time without any modifications?" So naive repetition pushes hidden states outside the distribution later layers expect, so performance

→ View original post on X — @askalphaxiv,

27 May 2026

Language Models Sleep: Context Replay for Deep Reasoning

By

@askalphaxiv

–

27 May 2026 1h38

"Language Models Need Sleep" Instead of thinking longer at answer time, this paper makes LLMs sleep before forgetting. They replay old context, write it into fast weights, clear the KV cache, and answer later at normal speed. More sleep improves deep reasoning over long and

→ View original post on X — @askalphaxiv,

27 May 2026

DeepMind LLMs Lean Proof-Search Agents Solve Open Problems

By

@askalphaxiv

–

25 May 2026 7h19

This new DeepMind research turns LLMs into Lean proof-search agents, so every step must compile and the final proof is mechanically verified. Under this setup, they solved 9 open Erdős problems, proved 44 OEIS conjectures, and helped advance actual research in optimization,

→ View original post on X — @askalphaxiv,

25 May 2026

ConvexTok: Linear Programming for Optimal LLM Tokenization

By

@askalphaxiv

–

23 May 2026 19h46

"Tokenisation via Convex Relaxations" Most LLM tokenizers still use BPE, a greedy merge algorithm that can waste vocab slots on locally good but globally suboptimal tokens. This paper turns tokenizer training into a linear program, then rounds the solution into ConvexTok. This

→ View original post on X — @askalphaxiv,

23 May 2026

Code as Agent Harness: Meta Paper Reframes AI Systems

By

@askalphaxiv

–

23 May 2026 9h09

"Code as Agent Harness" Agents are becoming less like chatbots that write code and more like systems that run on code. This new Meta paper reframes code as the harness around an agent, the executable layer for reasoning, acting, memory, verification, and coordination. The key

→ View original post on X — @askalphaxiv,

23 May 2026