Like all the models on the transformers models, all Whisper checkpoints can be loaded in a memory-efficient way! With load_in_8bit=True you can load the model with 8-bit precision. P.S. You can load a Whisper-large model < 6.6 gig VRAM
Whisper Model 8-bit Loading: Memory-Efficient Inference
By
–
Leave a Reply