Holy shit… a single vanilla transformer is outperforming every SOTA 3D model. Depth Anything 3 just rewired 3D perception. A single plain transformer no fancy architecture now reconstructs full 3D geometry from any set of images. One photo, 18 photos, posed or unposed, it
Single Vanilla Transformer Outperforms All SOTA 3D Models
By
–
