6/ LlaVA-RLHF – adapts factually augmented RLHF to aligning large multimodal models; this approach alleviates the reward hacking in RLHF and improves performance on the LlaVA-Bench dataset with the 94% performance level of the text-only GPT-4.
LlaVA-RLHF Achieves GPT-4 Level Performance on Multimodal Tasks
By
–
