most useful bit of code I've written all year: call map() on a HuggingFace dataset in torch distributed mode (like DDP) as one example, this will let you compute embeddings for a dataset in parallel, using all the GPUs you have http://
gist.github.com/jxmorris12/69a
730fee174f5309968e984c298f8f2
…
Parallel Dataset Embedding Computing with HuggingFace and DDP
By
–
Leave a Reply