Hours of video, now searchable by your agent. We just released a new set of agent skills and modular architecture for the Metropolis Blueprint for Search and Summarization, eliminating the need for manual configuration of multiple microservices. Load the skills into a
MULTIMODAL AI
-

Omni Model Creative Applications: Video Translation and Consistency
By
–
Mind-blowing to see what’s already possible with the new Omni model! From sketching drone camera paths to instant multilingual video translations and seamless character consistency. Check out these incredible creative use cases highlighted by @joshwoodward Time to experiment
-
AI Vision Model Limitations in Object Detection Tasks
By
–
This kind of prompt only works up to a point. If I ask it to put bounding boxes around all cars or all vehicles, it will mislabel lots of things while also hallucinating new things to label. pic.twitter.com/8B1CNnlbh5
— fofr (@fofrAI) 29 mai 2026This kind of prompt only works up to a point. If I ask it to put bounding boxes around all cars or all vehicles, it will mislabel lots of things while also hallucinating new things to label.
-

Qwen-VLA: Unified Vision-Language-Action Robot Learning
By
–
“Qwen-VLA: Unifying VLA Modeling across Tasks, Environments, and Robot Embodiments” They turned robot learning into one vision-language-action modeling problem instead of separate policies for each task, environment, and robot body. So by adding a DiT flow-matching action
-

Gemini Embedding 2: Native Multimodal Embedding Model
By
–
"Gemini Embedding 2" This paper turns Gemini into one native embedding model for text, image, video, audio, and interleaved multimodal inputs. Instead of converting everything into text first, it embeds raw modalities directly into one shared space, improving audio search,
-
Full-Stack Open-Source Video World Models Framework
By
–
minWM
— AK (@_akhaliq) 29 mai 2026
A Full-Stack Open-Source Framework for Real-Time Interactive Video World Models pic.twitter.com/hxUocDWGJjminWM A Full-Stack Open-Source Framework for Real-Time Interactive World Models
-

Multi-Model Person-of-Interest Detection on Voyager SDK
By
–
Multi-model person-of-interest ID + weapon detection, built on the Voyager SDK. The "weapon" was a lightsaber (Count Dooku's hilt). The specs were real though! 3x 4-chip Metis cards, 48 AIPU cores → 2.5 PetaOPS, 1,440+ model inferences/sec across multiple 8K streams at
-
Spatial Reasoning Benchmarks for AI Video Models
By
–
Larus went ham with this one! Love the synced highlighting on the camera path, something I wanted to try myself.
— Bilawal Sidhu (@bilawalsidhu) 29 mai 2026
Makes me think these could end up as spatial reasoning benchmarks for ai video models, esp in cities with existing 3d data as ground truth. pic.twitter.com/x8BBPPuOEuLarus went ham with this one! Love the synced highlighting on the camera path, something I wanted to try myself. Makes me think these could end up as spatial reasoning benchmarks for ai video models, esp in cities with existing 3d data as ground truth.
-
Wayve Labs Launches Frontier Physical AI Research Lab
By
–
Introducing Wayve Labs: @wayve_ai
's frontier research lab for physical AI. Wayve Labs is where we'll pursue pioneering research in world models, spatial intelligence, and much, much more. Thanks to @ryajetha for the great write-up
