Hours of video, now searchable by your agent. We just released a new set of agent skills and modular architecture for the Metropolis Blueprint for Search and Summarization, eliminating the need for manual configuration of multiple microservices. Load the skills into a
@nvidiaai
-

Step 3.7 Flash MoE Model with 256K Context Released
By
–
Step 3.7 Flash is here ICYMI: 198B MoE with 11B active params, 256K context, native image + video support. Day 0 support is live on http://
build.nvidia.com with GPU-accelerated endpoints, deploy with NVIDIA NIM inference microservices, and fine-tune with the NVIDIA NeMo -
LocateAnything: Vision-Language Detection Model for AI Agents
By
–
This #CVPR2026 paper from our research team is trending #1 on @HuggingFace 🤗
— NVIDIA AI (@NVIDIAAI) 28 mai 2026
Meet LocateAnything: a vision-language detection model that rethinks bounding box prediction. For AI agents and robots, “seeing” is only useful if a model can pinpoint where something is fast enough to… pic.twitter.com/2OGaQnUCnXThis #CVPR2026 paper from our research team is trending #1 on @HuggingFace Meet LocateAnything: a vision-language detection model that rethinks bounding box prediction. For AI agents and robots, “seeing” is only useful if a model can pinpoint where something is fast enough to
-

Linux Foundation OpenMDW Framework for Open Models
By
–
We're adopting the Linux Foundation’s OpenMDW framework across our open model families. This helps make open model licensing simpler and more consistent at scale. A single legal framework across models, code, documentation, and data helps reduce friction for developers and
-

Dynamo Snapshot: Fast Inference Startup on Kubernetes
By
–
Introducing Dynamo Snapshot, our approach for fast startup for inference workloads on Kubernetes, which reduces startup time from minutes to under 5 seconds. In production inference deployments demand fluctuates over time. Cold-starting inference workloads can take minutes,
-
Text Diffusion and Elastic Reasoning from Nemotron Labs
By
–
From the Lab: Text Diffusion and Elastic Reasoning | Nemotron Labs https://
x.com/i/broadcasts/1
dxYllaRLMLJX
… -
16 Local AI Agents Running on DGX and MiniMax M2.7
By
–
(2x DGX Sparks) + MiniMax M2.7 NVFP4 = 16 local AI agents running simultaneously 👀 https://t.co/Oaf5J1dyuF
— NVIDIA AI (@NVIDIAAI) 25 mai 2026(2x DGX Sparks) + MiniMax M2.7 NVFP4 = 16 local AI agents running simultaneously
-

NVIDIA AI announces TokenSpeed, a fast inference engine for agentic workloads
By
–

TokenSpeed is a brand new inference engine purpose built for speed-of-light agentic workloads. Read their blog to learn more about its advanced KV cache management, safe and efficient scheduler, and pluggable layered kernel system designed for multi-silicon support. Plus, it
-
Building Sub-Agents with NVIDIA Nemotron 3 Nano Omni Tutorial
By
–
How the Developer Community Builds Sub-Agents with NVIDIA Nemotron 3 Nano Omni | Nemotron Labs https://t.co/asHUSeUPgl
— NVIDIA AI (@NVIDIAAI) 5 mai 2026How the Developer Community Builds Sub-Agents with NVIDIA Nemotron 3 Nano Omni | Nemotron Labs
-

Scaling Agentic Workloads: 400+ Tokens/sec/User on Vera Rubin
By
–
What does it actually take to run agentic workloads at scale? Agents push token consumption, context length, and latency into extremely demanding regions. Extreme co-design on the Vera Rubin platform is built for these complex workloads, delivering 400+ tokens/sec/user on
