MULTIMODAL AI - AI Dynamics

Nano Banana 2 Video-to-Image Feature Launch

By

–

28 May 2026 18h42

Fun new feature landing in Nano Banana 2 today:

video-to-image

Pass a video or YouTube URL to Nano Banana 2 along with your prompt, and it'll use the video as context to make the image.

> create a comic strip from this video pic.twitter.com/1raoUB1UlT
— fofr (@fofrAI) 28 mai 2026

Fun new feature landing in Nano Banana 2 today: video-to-image Pass a video or YouTube URL to Nano Banana 2 along with your prompt, and it'll use the video as context to make the image. > create a comic strip from this video

→ View original post on X — @fofrai,

28 May 2026

Generative Supervision for Embodied Intelligence

By

@_akhaliq

–

28 May 2026 17h58

GEM

Generative Supervision Helps Embodied Intelligence pic.twitter.com/IlGPbxkwHS
— AK (@_akhaliq) 28 mai 2026

GEM Generative Supervision Helps Embodied Intelligence

→ View original post on X — @_akhaliq,

28 May 2026

Dubbing v2 Preserves Tone via Direct Performance Conditioning

By

@elevenlabs

–

28 May 2026 17h45

Dubbing v2 fixes flat audio by conditioning directly on the original performance, not a transcript. It ensures that tone, emotion, and delivery are preserved. This is the problem AI dubbing had never previously solved.

→ View original post on X — @elevenlabs,

28 May 2026

AI Dubbing v2: Multilingual Sync-Aware Translation

By

@elevenlabs

–

28 May 2026 17h45

Dubbing v2 also adapts phrasing for natural delivery across 90+ languages. Sync-aware translation logic means that starts and stops align with the original.

→ View original post on X — @elevenlabs,

28 May 2026

Agent Explorative Policy Optimization for Multimodal Agentic Reasoning

By

@_akhaliq

–

28 May 2026 17h29

Agent Explorative Policy Optimization for Multimodal Agentic Reasoning

→ View original post on X — @_akhaliq,

28 May 2026

Masked Region Transformer for Layered Image Generation

By

@_akhaliq

–

28 May 2026 17h06

MRT Masked Region Transformer for Layered Image Generation and Editing at Scale

→ View original post on X — @_akhaliq,

28 May 2026

104M Image-Text Pair Dataset Released on Hugging Face

By

@julien_c

–

28 May 2026 15h47

With 104M of image-text pairs, this is one of the largest, if not the largest, openly-licensed image dataset

And it's on @huggingface!!

Kudos @heyjasperai https://t.co/mTwGfZUzZU
— Julien Chaumond (@julien_c) 28 mai 2026

With 104M of image-text pairs, this is one of the largest, if not the largest, openly-licensed image dataset And it's on @huggingface
!! Kudos @heyjasperai

→ View original post on X — @julien_c,

28 May 2026

Creator builds drawing-capture tool with Google Flow and Gemini Omni

By

@datachaz

–

28 May 2026 15h40

WOW

this guy literally vibe-coded his own drawing-capture tool using Google Flow, then asked Gemini Omni for photorealistic red yarn, and created absolute MAGIC 🤯 pic.twitter.com/7VV2bKnzkf
— Charly Wargnier (@DataChaz) 28 mai 2026

WOW this guy literally vibe-coded his own drawing-capture tool using Google Flow, then asked Gemini Omni for photorealistic red yarn, and created absolute MAGIC

→ View original post on X — @datachaz,

28 May 2026

Meta Ray-Ban Glasses Footage Used to Train AI

By

@aihighlight

–

28 May 2026 14h56

BREAKING: Meta's Ray-Ban smart glasses record what you see. Some of that footage is watched by people. When a user shares data to improve Meta AI, the clips their glasses captured can be sent to human contractors who review and label them by hand. The reviewers work for a firm

→ View original post on X — @aihighlight,

28 May 2026

Lem and Adams’ Fictional AIs Predicted Modern AI Themes

By

@emollick

–

28 May 2026 6h01

Lem & Douglas Adams got AI right Presciently Golem XIV (from 1981) has an illustration of the jagged frontier as explained by an AI, Golem (GENERAL OPERATOR, LONG-RANGE, ETHICALLY STABILIZED, MULTIMODELING), discussing itself and a smarter AI (Honest Annie) compared to people

→ View original post on X — @emollick,

28 May 2026