That would only be true if the losses didn’t share any computation. I would think the far more common case of multiple losses would be regularizations on a shared set of layers, in which case splitting the loss backwards would still give the same result, but be early twice as
Multiple losses and shared computation in backpropagation analysis
By
–