AI Dynamics

Global AI News Aggregator

About

Cosmos 3: native action generation for VLM, world model, robot

3/ Inputs and outputs span text, image, video, audio AND action. That last one is the big deal. Cosmos 3 was trained natively to generate actions, so the same checkpoint can run as a vision-language model, a video world model, or a robot policy. No multi-model orchestration.

→ View original post on X — @kimmonismus