Skip to content

AI Dynamics

Global AI News Aggregator

Rechercher

MULTIMODAL AI

Paper praised for executing Gato idea with humanoid; more work desired

By

@nandodf

–

28 June 2026 1h15

This is a nice paper, well executed! @scott_e_reed had this in mind when developing Gato https://
arxiv.org/abs/2205.06175 — I’m glad to see the idea executed with a humanoid and I’d love to see more work along this direction. Gato stood for General AgenT One. Sadly, we weren’t able to

→ View original post on X — @nandodf

28 June 2026
Using video to learn control representations, touch important

By

@nandodf

–

28 June 2026 0h50

I think they also emerge from video. This is what I was the most excited about when helping with projects like Veo. My intent was never to create slop videos, but rather to use video to learn representations for control. I must say however that touch is super important and highly

→ View original post on X — @nandodf

28 June 2026
GPT Image 2 Edit used to improve orders, offering replacements

By

@levelsio

–

27 June 2026 23h15

Okay inputted that one and made a new one with GPT Image 2 Edit, if anyone who ordered likes this better, let me know and I'll replace your order 😀 x.com/levelsio/statu…

→ View original post on X — @levelsio

27 June 2026
World Labs improves at large open sky scenes

By

@bilawalsidhu

–

27 June 2026 22h46

Nice, world labs is getting good at large open sky scenes!

→ View original post on X — @bilawalsidhu

27 June 2026
GAP fixes hidden mismatch in multimodal AI visual evidence generation

By

@jiqizhixin

–

27 June 2026 20h00

Why do multimodal AI models struggle to “think” visually without external tools? Alibaba, University of Waterloo, and the Vector Institute present GAP—a new method that fixes a hidden mismatch in how models generate internal visual evidence. Instead of feeding raw decoder

→ View original post on X — @jiqizhixin

27 June 2026
Streaming 3D reconstruction from single camera, real-time, open-source

By

@datachaz

–

27 June 2026 9h39

🚨 Forget LIDAR.

The Robbyant team just dropped a streaming 3D model that reconstructs scenes live, at ~20 FPS, over long sequences.

One single camera. Runs in real time. Open-source.

Entirely end-to-end.

NO iterative optimization tricks and no post-processing cleanup steps!… pic.twitter.com/zo8GuGQYdI
— Charly Wargnier (@DataChaz) 27 juin 2026

Forget LIDAR. The Robbyant team just dropped a streaming 3D model that reconstructs scenes live, at ~20 FPS, over long sequences. One single camera. Runs in real time. Open-source. Entirely end-to-end. NO iterative optimization tricks and no post-processing cleanup steps!

→ View original post on X — @datachaz

27 June 2026
Baton: AI Framework for Joint Video-Audio Generation

By

@jiqizhixin

–

27 June 2026 8h53

What if AI could plan video and audio together before generating? Researchers from Fudan University and Tencent Hunyuan present Baton, a new framework that creates shared semantic blueprints for joint video-audio generation. Instead of relying on coarse text prompts, Baton

→ View original post on X — @jiqizhixin

27 June 2026
AI cannot yet generate skill-teaching videos, says KIVI benchmark

By

@jiqizhixin

–

27 June 2026 3h48

Can AI generate videos that actually teach you a skill or explain a concept? Fudan University and Shanghai Jiao Tong University researchers say not yet. They introduce KIVI, a benchmark that tests video generation on factual, information-seeking prompts—like procedures or

→ View original post on X — @jiqizhixin

27 June 2026
Describing vs directing: video models revolutionize camera movement

By

@bilawalsidhu

–

27 June 2026 1h44

“This is the difference between describing a shot and directing one.”

Video models are getting so good that people are finally getting 3d pilled

Way more fun to grab a phone and record the exact camera move you want vs. endlessly hitting the slot machine pic.twitter.com/wQRT2wsyLn
— Bilawal Sidhu (@bilawalsidhu) 26 juin 2026

“This is the difference between describing a shot and directing one.” models are getting so good that people are finally getting 3d pilled Way more fun to grab a phone and record the exact camera move you want vs. endlessly hitting the slot machine

→ View original post on X — @bilawalsidhu

27 June 2026
Senya turns any song into ASL performance using Pika stitching

By

@pika_labs

–

26 June 2026 22h27

3. Senya: Turns any song into an ASL performance, using Pika to stitch and transform real signing clips into a music video that maintains both the accuracy of the lyrics and the original energy of the song.

Team: Tejas Mundhe, Nithila Sadheesh, Deeksha Vaidyanathan pic.twitter.com/KqJjalzeRP
— Pika (@pika_labs) 26 juin 2026

3. Senya: Turns any song into an ASL performance, using Pika to stitch and transform real signing clips into a music video that maintains both the accuracy of the lyrics and the original energy of the song. Team: Tejas Mundhe, Nithila Sadheesh, Deeksha Vaidyanathan

→ View original post on X — @pika_labs

26 June 2026

1 2 3 … 1,207

INNOVATION GENERATIVE AI RESEARCH LLMS TOOLS MACHINE LEARNING CODE MARKET TRENDS TECHNOLOGY BUSINESS BIG TECH ETHICS ENTERPRISE AI SOFTWARE AGENTS AUTOMATION APPS COMPUTING DATA POLICY OPEN SOURCE MULTIMODAL AI REGULATION CULTURE CREATIVE AI PROMPT ENGINEERING SOCIETY ECONOMY SAFETY EDUCATION INVESTMENT AI HARDWARE AGI HARDWARE JOBS STARTUPS INDUSTRY ROBOTICS WORKFORCE SECURITY CYBERSECURITY HEALTHCARE AI SYSTEMS SUSTAINABILITY WEB3 DECENTRALIZED AI

AI Dynamics

Global AI News Aggregator

About
Archives
Contact

Rechercher