TIL, will look into! The thing that makes this a bit complicated right now is the start latency. What bloats up the setup time right now is the dataset and its tokenization, which is all done in Python right now. Installing huggingface datasets, downloading FineWeb 10B and
Start Latency Issues with Dataset Tokenization in Python
By
–
Leave a Reply