New open release from @Apple – AIMv2 – large scale vision encoders > Outperforms CLIP and SigLIP on major multimodal understanding benchmarks
> Beats DINOv2 on open-vocabulary object detection and referring expression comprehension
> Strong recognition performance w/
Apple Releases AIMv2 Vision Encoders Outperforming CLIP
By
–
Leave a Reply