I love the modelling simplicity in this paper: combine 3 transformers into a big transformer and, voilà! amazing results for mapping images+text to text.
Combining Three Transformers Achieves Strong Multimodal Image-Text Results
By
–
Global AI News Aggregator
By
–
I love the modelling simplicity in this paper: combine 3 transformers into a big transformer and, voilà! amazing results for mapping images+text to text.
Leave a Reply