Existing MLLMs suffer from distribution shifts, which limit their multimodal reasoning, particularly in Chain-of-Thought (CoT) performance Cue.. Mixed Preference Optimisation (MPO) A PO algorithm that enhances multimodal reasoning by teaching the model to learn relative
Mixed Preference Optimisation Enhances Multimodal Reasoning in MLLMs
By
–
Leave a Reply