AI Dynamics

Global AI News Aggregator

About

daVinci-MagiHuman: Fast audio-video generation transformer model released

daVinci-MagiHuman is here: 15B single-stream Transformer for joint audio-video generation. 🎬 Demo video👇 ⚡ Blazing fast: 5s 256p video in 2s, 1080p in 38s — single H100 🎯 80.0% win rate vs Ovi 1.1, 60.9% vs LTX 2.3 (2,000 pairwise evals) ✅ WER 14.60% — best-in-class audio-visual sync, beats LTX 2.3 (19.23%) and Ovi 1.1 (40.45%) 📚 6 languages: Mandarin, Cantonese, EN, JP, KR, DE, FR 🧠 One unified stream: text + video + audio tokens, self-attention only. 🛠️ Full stack open: base + distilled (8-step, CFG-free) + super-res + inference code 📄 Apache 2.0. 🤖: modelscope.cn/models/GAIR/da… 💻: github.com/GAIR-NLP/daVinci-…

→ View original post on X — @shiqi_yang_147, 2026-03-30 09:18 UTC