LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers Li et al.: https://
arxiv.org/abs/2310.03294 #Transformers #MachineLearning #ArtificialIntelligence
LightSeq: Sequence Level Parallelism for Distributed Transformer Training
By
–
