Yes. It’s not that we’ve discovered some magic bullet, but rather that JAX, or at least the open source version of it, is mostly optimized for small to medium-sized training runs on Google TPUs, whereas we need to massive training runs on Nvidia GPUs. Pipeline parallelism is
JAX GPU scaling and pipeline parallelism for training
By
–