2/ What changed: old Cosmos split the work across separate models — one to understand a scene, one to generate video, one for controlled simulation.
Cosmos 3 fuses everything into a single Mixture-of-Transformers with two towers:
→ a reasoner (the VLM "brain")
→ a diffusion
Cosmos 3 merges reasoner and diffusion towers in one model
By
–
