I haven't even touched on the vision capabilities possible in gpt-4 some obvious ones:
-multi-modal reasoning and action (e.g. MM-REACT paper)
-better image gen (imagine gpt-4 iterates on your Midjourney prompts) -few-shot prompting and learning through images/videos
GPT-4 Vision Capabilities: Multimodal Reasoning and Image Generation
By
–
Leave a Reply