“Their design … has made it so that 1 core looks like 1 chip looks like a whole network. ML training with trillion parameter models requires a lot of compute, you need an architecture that operates the same with 1 chip or 1000 chips.” Thanks @IanCutress
Scalable Chip Architecture for Trillion Parameter ML Training
By
–