If you are curious how Multimodal LLMs work, I wrote a new article to explain the two main approaches, decoder-only- and cross-attention-style: https://
magazine.sebastianraschka.com/p/understandin
g-multimodal-llms
…
Plus, I reviewed and summarized the 10 latest research papers to see how it's done in practice.
Happy reading!
Understanding Multimodal LLMs: Decoder-Only and Cross-Attention Approaches
By
–