AI Dynamics

Global AI News Aggregator

MULTIMODAL AI

AI Perceives Beyond Human Visual Capabilities

By

AI Dynamics

–

04 December 2022 1h38

AI sees things that humans can't

→ View original post on X — @erikbryn,

4 December 2022
AI-Powered Glasses Enable Deaf People to Read Conversations

By

AI Dynamics

–

02 December 2022 21h16

AMAZING: Artificial intelligence-powered glasses let deaf people read conversations. And users can even rewind pictures. #AIForGood #iot #AI #ML #DX #deaf #ArtificialIntelligence pic.twitter.com/JoM7aFD5LV
— Sean Gardner (@2morrowknight) 2 décembre 2022

AMAZING: Artificial intelligence-powered glasses let deaf people read conversations. And users can even rewind pictures. #AIForGood #iot #AI #ML #DX #deaf #ArtificialIntelligence

→ View original post on X — @2morrowknight,

2 December 2022
RoentGen: Vision-Language Model for Chest X-rays

By

AI Dynamics

–

02 December 2022 7h10

A later work, "RoentGen: Vision-Language Foundation Model for Chest X-ray Generation", was also conducted by Stanford's Department of Radiology, in collaboration with Tanishq Abraham (@iScienceLuvr) at @StabilityAI. https://stanfordmimi.github.io/RoentGen/

→ View original post on X — @hardmaru,

2 December 2022
Stanford researchers generate synthetic chest X-rays with Stable Diffusion

By

AI Dynamics

–

02 December 2022 6h55

Stanford researchers create synthetic yet realistic chest X-rays using #StableDiffusion SD is typically used for art, not for science. In this case, the “radiographs” might be high quality enough to complement real image datasets. https://
arxiv.org/abs/2210.04133

→ View original post on X — @hardmaru,

2 December 2022
RA-CM3 Model Outperforms Baselines with 70% Less Compute

By

AI Dynamics

–

01 December 2022 22h34

The RA-CM3 model significantly outperforms baseline multimodal models on both image & caption generation tasks, while using <30% of the compute of comparable models & exhibiting novel capabilities like knowledge-intensive image generation & multimodal in-context learning. 2/4

→ View original post on X — @aiatmeta,

1 December 2022
Meta AI Introduces RA-CM3: Multimodal Model for Text and Image Generation

By

AI Dynamics

–

01 December 2022 22h34

New paper from our team at Meta AI. Retrieval-Augmented CM3 (RA-CM3) is the first multimodal model that can retrieve and generate mixtures of text and images — while also reducing training cost and model size. Now available on arXiv https://
arxiv.org/abs/2211.12561 1/4

→ View original post on X — @aiatmeta,

1 December 2022
Language Interfaces: The Future of Human-Computer Interaction

By

@sama

–

30 November 2022 20h38

language interfaces are going to be a big deal, i think. talk to the computer (voice or text) and get what you want, for increasingly complex definitions of "want"! this is an early demo of what's possible (still a lot of limitations–it's very much a research release).

→ View original post on X — @sama,

30 November 2022
TorchMultimodal: PyTorch Library for Multimodal Multi-Task Models

By

AI Dynamics

–

29 November 2022 23h06

TorchMultimodal is a @PyTorch library for training state-of-the-art multimodal multi-task models at scale. You can learn more and get started here https://
bit.ly/3XalYlA 6/6

→ View original post on X — @aiatmeta,

29 November 2022
AI Transforms Child Drawings Into Realistic Images

By

AI Dynamics

–

29 November 2022 23h03

Bring a child's drawing to life in this demo. By teaching AI to work effectively with this quintessential human form of creativity, we hope this project will move us closer to building AI that can understand the world from a human PoV https://
sketch.metademolab.com 4/6

→ View original post on X — @aiatmeta,

29 November 2022
Make-A-Video: Text-to-Video Generation Research

By

AI Dynamics

–

29 November 2022 23h03

Make-A-Video research builds on the progress made in text-to-image generation technology to enable text-to-video generation.

You can see examples of the videos here ➡️ https://t.co/g8JLL8xvLh

3/6 pic.twitter.com/J2oBu7oll2
— AI at Meta (@AIatMeta) 29 novembre 2022

Make-A-Video research builds on the progress made in text-to-image generation technology to enable text-to-video generation. You can see examples of the videos here https://
makeavideo.studio 3/6

→ View original post on X — @aiatmeta,

29 November 2022