What if training AI to see was more like training it to talk? Researchers from U. Michigan, NYU, Princeton & U. Virginia present NEPA. Instead of reconstructing pixels or using contrastive loss, they train a vision model to predict the next embedding in a sequence, like
NEPA: Training Vision Models Like Language Models
By
–
