"Co-Evolving Policy Distillation" A lot of post-training today follows a simple recipe where you train separate experts, then distill them into one model. But the problem is, by the time distillation starts, the expert and the student have already drifted too far apart, so a
@askalphaxiv
-

Recursive Multi-Agent LLMs Collaborate in Latent Space
By
–
“Recursive Multi-Agent Systems” Many multi-agent LLM systems rely on agents passing text back and forth. This paper argues for a different approach where it makes agents recur together in latent space. So agents refine latent thoughts, pass hidden states across one another,
-

DeepSeek Paper Explores Visual Primitives for Multimodal Reasoning
By
–
“Thinking with Visual Primitives” New paper from DeepSeek… But got taken down? Most multimodal models can look at an image, but they still mostly reason in language. So on tasks like counting, spatial reasoning, and mazes, words alone are a weak way to keep track of visual
-
alphaXiv Partners with OpenRouter for Direct Model Access
By
–
alphaXiv 🤝 OpenRouter
— alphaXiv (@askalphaxiv) 30 avril 2026
Excited to announce our partnership with @OpenRouter
You can now hover over any model name on any paper, and you’ll be directly connected to OpenRouter.
In the pop up window, you’ll be able to see the provider name, model name, description, and its top… pic.twitter.com/cwOC1Cd7fHalphaXiv OpenRouter Excited to announce our partnership with @OpenRouter You can now hover over any model name on any paper, and you’ll be directly connected to OpenRouter. In the pop up window, you’ll be able to see the provider name, model name, description, and its top
-
alphaXiv: Explore Academic Papers Online
By
–
Check it out by visiting any paper on http://
alphaXiv.org! -

Estimating Black-Box LLM Size via Factual Knowledge Probes
By
–
"Incompressible Knowledge Probes: Estimating Black-Box LLM Parameter Counts via Factual Capacity" This paper estimate close source LLM size from long-tail facts by using 1400 probes of obscure knowledge. The idea is that reasoning can be compressed, but factual storage can't.
-

Preconditioned DeltaNet Adds Curvature-Aware Linear Recurrence Modeling
By
–
“Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences” This paper views linear recurrences through a least-squares/test-time regression lens, and adds the missing curvature information via preconditioning. Main idea: precondition the delta-rule
-

SWE-chat: Real-World Coding Agent Interactions Dataset
By
–
"SWE-chat: Coding Agent Interactions From Real Users in the Wild" This paper proposes a real-world coding-agent dataset. It tracks 6K sessions from actual developers, with prompts, tool calls, and line-level human vs. agent authorship. Main result is that agent autonomy is
-

Kwai Summary Attention: Learnable Tokens for Long Context
By
–
"Kwai Summary Attention Technical Report" This paper uses learnable summary tokens for long-context attention. It splits text into chunks, compresses each chunk into a summary token, keeps recent text dense within a sliding local window, and reads distant context through
-

Recurrent Transformer: Greater Effective Depth and Efficient Decoding
By
–
“The Recurrent Transformer: Greater Effective Depth and Efficient Decoding” Transformers are great at parallel processing, but they’re shallow through time, as each layer only lets tokens interact once. This paper changes that by storing keys/values from each layer’s output,
