We also found that multimodal reasoning capabilities emerge naturally after online RL (even when RL is only done on math & code text data), as long as you start from an initial multimodal model.
Multimodal Reasoning Emerges from Online RL Training
By
–
Leave a Reply