One of the big bottlenecks with LLMs & Vision Transformers is GPU memory on consumer devices. Wrote about my favorite techniques for reducing peak memory in PyTorch: https://
lightning.ai/pages/communit
y/tutorial/pytorch-memory-vit-llm/
… Focused on techniques that don't require architecture changes! Suggestions welcome!
Reducing GPU Memory for LLMs and Vision Transformers in PyTorch
By
–
Leave a Reply