AI Dynamics

Global AI News Aggregator

About

Combining Three Transformers Achieves Strong Multimodal Image-Text Results

I love the modelling simplicity in this paper: combine 3 transformers into a big transformer and, voilà! amazing results for mapping images+text to text.

→ View original post on X — @nandodf