R-4B Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning
R-4B: Auto-Thinking MLLMs via Bi-Mode Annealing and RL
By
–

By
–

R-4B Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning