AI Dynamics

Global AI News Aggregator

@shiqi_yang_147

  • Schmidhuber Claims LeCun’s JEPA Repackages His 1992 PMAX Work
    Schmidhuber Claims LeCun’s JEPA Repackages His 1992 PMAX Work

    Seems this discussion is moving into a new chapter Jürgen Schmidhuber (@SchmidhuberAI) Dr. LeCun's heavily promoted Joint Embedding Predictive Architecture (JEPA, 2022) [5] is the heart of his new company. However, the core ideas are not original to LeCun. Instead, JEPA is essentially identical to our 1992 Predictability Maximization system (PMAX) [1][14]. Details in reference [19] which contains many additional references. Motivation of PMAX [1][14]. Since details of inputs are often unpredictable from related inputs, two non-generative artificial neural networks interact as follows: one net tries to create a non-trivial, informative, latent representation of its own input that is predictable from the latent representation of the other net’s input. PMAX [1][14] is actually a whole family of methods. Consider the simplest instance in Sec. 2.2 of [1]: an auto encoder net sees an input and represents it in its hidden units (its latent space). The other net sees a different but related input and learns to predict (from its own latent space) the auto encoder's latent representation, which in turn tries to become more predictable, without giving up too much information about its own input, to prevent what's now called “collapse." See illustration 5.2 in Sec. 5.5 of [14] on the "extraction of predictable concepts." The 1992 PMAX paper [1] discusses not only auto encoders but also other techniques for encoding data. The experiments were conducted by my student Daniel Prelinger. The non-generative PMAX outperformed the generative IMAX [2] on a stereo vision task. The 2020 BYOL [10] is also closely related to PMAX. In 2026, @misovalko, leader of the BYOL team, praised PMAX, and listed numerous similarities to much later work [19]. Note that the self-created “predictable classifications” in the title of [1] (and the so-called “outputs” of the entire system [1]) are typically INTERNAL "distributed representations” (like in the title of Sec. 4.2 of [1]). The 1992 PMAX paper [1] considers both symmetric and asymmetric nets. In the symmetric case, both nets are constrained to emit "equal (and therefore mutually predictable)" representations [1]. Sec. 4.2 on “finding predictable distributed representations” has an experiment with 2 weight-sharing auto encoders which learn to represent in their latent space what their inputs have in common (see the cover image of this post). Of course, back then compute was was a million times more expensive, but the fundamental insights of "JEPA" were present, and LeCun has simply repackaged old ideas without citing them [5,6,19]. This is hardly the first time LeCun (or others writing about him) have exaggerated LeCun's own significance by downplaying earlier work. He did NOT "co-invent deep learning" (as some know-nothing "AI influencers" have claimed) [11,13], and he did NOT invent convolutional neural nets (CNNs) [12,6,13], NOR was he even the first to combine CNNs with backpropagation [12,13]. While he got awards for the inventions of other researchers whom he did not cite [6], he did not invent ANY of the key algorithms that underpin modern AI [5,6,19]. LeCun's recent pitch: 1. LLMs such as ChatGPT are insufficient for AGI (which has been obvious to experts in AI & decision making, and is something he once derided @GaryMarcus for pointing out [17]). 2. Neural AIs need what I baptized a neural "world model" in 1990 [8][15] (earlier, less general neural nets of this kind, such as those by Paul Werbos (1987) and others [8], weren't called "world models," although the basic concept itself is ancient [8]). 3. The world model should learn to predict (in non-generative "JEPA" fashion [5]) higher-level predictable abstractions instead of raw pixels: that's the essence of our 1992 PMAX [1][14]. Astonishingly, PMAX or "JEPA" seems to be the unique selling proposition of LeCun's 2026 company on world model-based AI in the physical world, which is apparently based on what we published over 3 decades ago [1,5,6,7,8,13,14], and modeled after our 2014 company on world model-based AGI in the physical world [8]. In short, little if anything in JEPA is new [19]. But then the fact that LeCun would repackage old ideas and present them as his own clearly isn't new either [5,6,18,19]. FOOTNOTES 1. Note that PMAX is NOT the 1991 adversarial Predictability MINimization (PMIN) [3,4]. However, PMAX may use PMIN as a submodule to create informative latent representations [1](Sec. 2.4), and to prevent what's now called “collapse." See the illustration on page 9 of [1]. 2. Note that the 1991 PMIN [3] also predicts parts of latent space from other parts. However, PMIN's goal is to REMOVE mutual predictability, to obtain maximally disentangled latent representations called factorial codes. PMIN by itself may use the auto encoder principle in addition to its latent space predictor [3]. 3. Neither PMAX nor PMIN was my first non-generative method for predicting latent space, which was published in 1991 in the context of neural net distillation [9]. See also [5-8]. 4. While the cognoscenti agree that LLMs are insufficient for AGI, JEPA is so, too. We should know: we have had it for over 3 decades under the name PMAX! Additional techniques are required to achieve AGI, e.g., meta learning, artificial curiosity and creativity, efficient planning with world models, and others [16]. REFERENCES (easy to find on the web): [1] J. Schmidhuber (JS) & D. Prelinger (1993). Discovering predictable classifications. Neural Computation, 5(4):625-635. Based on TR CU-CS-626-92 (1992): people.idsia.ch/~juergen/pre… [2] S. Becker, G. E. Hinton (1989). Spatial coherence as an internal teacher for a neural network. TR CRG-TR-89-7, Dept. of CS, U. Toronto. [3] JS (1992). Learning factorial codes by predictability minimization. Neural Computation, 4(6):863-879. Based on TR CU-CS-565-91, 1991. [4] JS, M. Eldracher, B. Foltin (1996). Semilinear predictability minimization produces well-known feature detectors. Neural Computation, 8(4):773-786. [5] JS (2022-23). LeCun's 2022 paper on autonomous machine intelligence rehashes but does not cite essential work of 1990-2015. [6] JS (2023-25). How 3 Turing awardees republished key methods and ideas whose creators they failed to credit. Technical Report IDSIA-23-23. [7] JS (2026). Simple but powerful ways of using world models and their latent space. Opening keynote for the World Modeling Workshop, 4-6 Feb, 2026, Mila – Quebec AI Institute. [8] JS (2026). The Neural World Model Boom. Technical Note IDSIA-2-26. [9] JS (1991). Neural sequence chunkers. TR FKI-148-91, TUM, April 1991. (See also Technical Note IDSIA-12-25: who invented knowledge distillation with artificial neural networks?) [10] J. Grill et al (2020). Bootstrap your own latent: A "new" approach to self-supervised Learning. arXiv:2006.07733 [11] JS (2025). Who invented deep learning? Technical Note IDSIA-16-25. [12] JS (2025). Who invented convolutional neural networks? Technical Note IDSIA-17-25. [13] JS (2022-25). Annotated History of Modern AI and Deep Learning. Technical Report IDSIA-22-22, arXiv:2212.11279 [14] JS (1993). Network architectures, objective functions, and chain rule. Habilitation Thesis, TUM. See Sec. 5.5 on "Vorhersagbarkeitsmaximierung" (Predictability Maximization). [15] JS (1990). Making the world differentiable: On using fully recurrent self-supervised neural networks for dynamic reinforcement learning and planning in non-stationary environments. Technical Report FKI-126-90, TUM. [16] JS (1990-2026). AI Blog. [17] @GaryMarcus. Open letter responding to @ylecun. A memo for future intellectual historians. Substack, June 2024. [18] G. Marcus. The False Glorification of @ylecun. Don’t believe everything you read. Substack, Nov 2025. [19] J. Schmidhuber. Who invented JEPA? Technical Note IDSIA-3-22, IDSIA, Switzerland, March 2026. people.idsia.ch/~juergen/who… — https://nitter.net/SchmidhuberAI/status/2038989707917271210#m

    → View original post on X — @shiqi_yang_147, 2026-04-02 14:53 UTC

  • CapCut Launches Dreamina Seedance 2.0 AI Features in Japan

    well CapCut (@capcutapp) Calling all Japanese creators – we are rolling out Dreamina Seedance 2.0 in Japan today across CapCut app, desktop and web. Enjoy creating now! Features in CapCut you can enjoy Dreamina Seedance 2.0: → Web: Studio (omni reference supported) → Desktop: Media -> AI (i2v, t2v supported) → App: AI Lab, AI Generator, AI (i2v, t2v supported) — https://nitter.net/capcutapp/status/2038806148934189070#m

    → View original post on X — @shiqi_yang_147, 2026-03-31 15:19 UTC

  • Sarashina2.2-OCR Released: High-Precision Document Image Analysis Model

    📢 Sarashina2.2-OCR Now Available Releasing an OCR model specialized in document image analysis 🚀 ✅ Markdown conversion while preserving layout
    ✅ Strong performance on vertical Japanese documents
    ✅ Detects diagrams and tables, outputs their positions Convert complex document images into high-precision, user-friendly formats for both humans and AI ✨ Learn more here
    https://huggingface.co/sbintuitions/sarashina2.2-ocr [Translated from EN to English]

    → View original post on X — @shiqi_yang_147, 2026-03-31 01:49 UTC

  • daVinci-MagiHuman: Fast audio-video generation transformer model released

    daVinci-MagiHuman is here: 15B single-stream Transformer for joint audio-video generation. 🎬 Demo video👇 ⚡ Blazing fast: 5s 256p video in 2s, 1080p in 38s — single H100 🎯 80.0% win rate vs Ovi 1.1, 60.9% vs LTX 2.3 (2,000 pairwise evals) ✅ WER 14.60% — best-in-class audio-visual sync, beats LTX 2.3 (19.23%) and Ovi 1.1 (40.45%) 📚 6 languages: Mandarin, Cantonese, EN, JP, KR, DE, FR 🧠 One unified stream: text + video + audio tokens, self-attention only. 🛠️ Full stack open: base + distilled (8-step, CFG-free) + super-res + inference code 📄 Apache 2.0. 🤖: modelscope.cn/models/GAIR/da… 💻: github.com/GAIR-NLP/daVinci-…

    → View original post on X — @shiqi_yang_147, 2026-03-30 09:18 UTC

  • NeurIPS 2026 Official Statement: Clarifying Sanctions Tool Link Error and Policy Adjustment

    Yoohoo NeurIPS Conference (@NeurIPSConf) We want to speak directly to the concern many of you have expressed, and we owe you a clear explanation of what happened, why it happened, and where we stand now. We understand this situation caused genuine alarm and we take that seriously. In preparing the NeurIPS 2026 handbook, we included a link to a US government sanctions tool that covers a significantly broader set of restrictions than those NeurIPS is actually required to follow. This error was due to miscommunication between the NeurIPS Foundation and our legal team; there was never an intention to restrict participation beyond our mandatory compliance obligations. The responsibility for that error is ours as an organization, and we deeply apologize for the alarm and impact this miscommunication had on our community. We have updated the link and clarified the text of our policy, which is consistent with that of ACM and IEEE, as well as other international conferences and NeurIPS in the past. As in previous years, NeurIPS welcomes submissions from all compliant institutions and individuals. We want to reiterate that NeurIPS is a community-driven event, created by and for the community, and strives to be inclusive. The NeurIPS 2026 organizing committee was particularly saddened to learn of this institutional miscommunication. The organizing committee has taken on the responsibility of running the conference this year with the goal of fostering open communication, knowledge sharing, and global scientific discourse. We thank the community for bringing this issue to our attention and working with us through this situation. [Translated from EN to English]

    → View original post on X — @shiqi_yang_147, 2026-03-27 10:15 UTC

  • Deep dive series on Saining Xie’s seven-hour interview about AI
    Deep dive series on Saining Xie’s seven-hour interview about AI

    I’ve started a new four-part deep dive series exploring a fascinating seven-hour interview with Saining Xie, hosted by Xiaojun Zhang. robonaissance.com/t/language… Saining Xie, cofounder and chief science officer of AMI Labs, believes the AI industry’s most successful technology is also its most seductive trap. His company has just raised $1.03 billion to prove it. 张小珺 Xiaojun Zhang (@zhang_benita) 和@sainingxie 一起挑战7小时播客!他刚和Yann LeCun踏上“世界模型”的创业旅程(AMI Labs)。这是他第一次Podcast、第一次访谈。 2026年2月雪后的一天,我们在纽约布鲁克林,从下午2点,开启了一场始料未及的马拉松式访谈,直到凌晨时分散去。 这篇访谈的中文标题叫做《逃出硅谷》,但他又不厌其烦地枚举了影响他学术生涯的每一个人,并反反复复口头描摹这些人的人物特征(侯晓迪、何恺明、杨立昆、李飞飞…)正是这些,让这篇“逃出硅谷”的对话充斥着人性的温度。 By the way, 下面是访谈的YouTube版本,我们提供了中英字幕。 And yes, 我们是在用播客给这个世界建模😎 A 7-hour podcast with Saining Xie. He has just begun a new journey on world models with Yann LeCun at AMI Labs. This was his first podcast appearance and his first long-form interview. A day after the snowfall in February 2026, in Brooklyn, New York, we started recording at 2 p.m. What followed became an unexpected marathon conversation that lasted until the early hours of the morning. The Chinese title of the interview is “Escaping Silicon Valley.” Yet throughout the conversation, he patiently listed the people who shaped his academic life, repeatedly sketching their personalities in vivid detail: Hou Xiaodi, Kaiming He, Yann LeCun, Fei-Fei Li, and others. These portraits are what give this “escape from Silicon Valley” conversation its human warmth. By the way, the YouTube version of the interview is below, with Chinese and English subtitles. And yes, we are using podcasts to model the world 😎 A 7-hour marathon interview with Saining Xie: World Models, AMI Labs, Ya… piped.video/rIwgZWzUKm8?si=edxa… 来自 @YouTube — https://nitter.net/zhang_benita/status/2033467851655512142#m

    → View original post on X — @shiqi_yang_147, 2026-03-21 09:54 UTC

  • Seoul World Model: AI-Powered Real City Navigation Simulation

    Looking forward to the streaming Google Maps navigator Junyoung Seo (@jyseo_cv) What if a world model could render not an imagined place, but the actual city? We introduce Seoul World Model, the first world simulation model grounded in a real-world metropolis. TL;DR: We made a world model RAG over millions of street-views. proj: seoul-world-model.github.io/ — https://nitter.net/jyseo_cv/status/2033739972264792430#m

    → View original post on X — @shiqi_yang_147, 2026-03-19 10:34 UTC

  • Video Models Over Language Models for Robotics Manipulation Tasks
    Video Models Over Language Models for Robotics Manipulation Tasks

    Taking inspiration from VideoJAM, would the physical consistency of the generated videos also improve? Accurate action prediction requires physically plausible imagination, conversely physically plausible imagination is best supported when it is consistent with feasible actions. Seonghyeon Ye (@SeonghyeonYe) VLAs (from VLMs) ❌ => WAMs (from Models) ✅ Why WAMs? 1️⃣ World Physics: VLMs know the internet, but Models implicitly model the physical laws essential for manipulation. 2️⃣ The "GPT Direction": VLAs are like BERT (rely heavily on task-specific post-training). WAMs are like GPT (pre-train & prompt), unlocking incredible zero-shot transfer! What I want to see in 2026: 📈 Scaling Laws: We will see much clearer scaling laws for robotics compared to VLAs. 🤝 Human-to-Robot Transfer: Unlocking massive transfer capabilities using video as a shared representation space. 🤖 Zero-Shot Mastery: Moving from short-horizon tasks to long-horizon, dexterous manipulation without task-specific demonstrations. We recently open-sourced the checkpoints, training and inference code. Dive into the research! 👇 📄 Paper: arxiv.org/abs/2602.15922 💻 Code: github.com/dreamzero0/dreamz… 🤗 HF: huggingface.co/GEAR-Dreams/D… — https://nitter.net/SeonghyeonYe/status/2024501978106061056#m

    → View original post on X — @shiqi_yang_147, 2026-03-03 10:53 UTC

  • Industrial audio-video generation models reach stereo quality standards

    Recent industrial audio-video gen model(s) is already stereo (more than good), what else to do in this area hmm

    → View original post on X — @shiqi_yang_147, 2026-02-25 03:31 UTC

  • WAMs Replace VLAs: Video Models for Advanced Robot Manipulation
    WAMs Replace VLAs: Video Models for Advanced Robot Manipulation

    An elegant and simple pipeline Seonghyeon Ye (@SeonghyeonYe) VLAs (from VLMs) ❌ => WAMs (from Models) ✅ Why WAMs? 1️⃣ World Physics: VLMs know the internet, but Models implicitly model the physical laws essential for manipulation. 2️⃣ The "GPT Direction": VLAs are like BERT (rely heavily on task-specific post-training). WAMs are like GPT (pre-train & prompt), unlocking incredible zero-shot transfer! What I want to see in 2026: 📈 Scaling Laws: We will see much clearer scaling laws for robotics compared to VLAs. 🤝 Human-to-Robot Transfer: Unlocking massive transfer capabilities using video as a shared representation space. 🤖 Zero-Shot Mastery: Moving from short-horizon tasks to long-horizon, dexterous manipulation without task-specific demonstrations. We recently open-sourced the checkpoints, training and inference code. Dive into the research! 👇 📄 Paper: arxiv.org/abs/2602.15922 💻 Code: github.com/dreamzero0/dreamz… 🤗 HF: huggingface.co/GEAR-Dreams/D… — https://nitter.net/SeonghyeonYe/status/2024501978106061056#m

    → View original post on X — @shiqi_yang_147, 2026-02-21 03:30 UTC