these mad scientists really just allowed strangers to use their chatGPT api key for free in return they collected 570k real conversations with chatGPT I’ve heard this data is super valuable, but was this really worth it?
@jxmnop
-
Distributed GPU Processing: File Synchronization and Aggregation
By
–
because in distributed mode, each GPU runs its own process. so this code will run simultaneously multiple times, once per GPU. so in this setting, each GPU writes a file, waits for everyone, then aggregates all the files, waits again, then deletes its own file 🙂
-

Parallel Dataset Embedding Computing with HuggingFace and DDP
By
–
most useful bit of code I've written all year: call map() on a HuggingFace dataset in torch distributed mode (like DDP) as one example, this will let you compute embeddings for a dataset in parallel, using all the GPUs you have http://
gist.github.com/jxmorris12/69a
730fee174f5309968e984c298f8f2
… -
Computational Sub-area Emerges Within Prompt Engineering
By
–
cs will be the computational sub area of the prompt engineering department
-
Prompt Engineering: The English Major Plot Twist
By
–
plot twist, the real prompt engineering major was just English all along
-
When Will Cornell CS Offer Prompt Engineering Major?
By
–
how many years until Cornell CS allows people to major in Prompt Engineering?
-
Managing ML Training Runs: Preventing Micromanagement Through Accountability
By
–
giving my wandb password to my roommate for a few days to stop myself from logging in and micromanaging training runs
-

LLMs at NLP Conferences: A Domain Mismatch Discussion
By
–
I think this is a really good idea. Sometimes talking about LLMs at NLP conferences feels like talking about chemistry with a bunch of particle physicists
