Not for running models, you need the whole thing in memory because every token that's generated includes calculations run against against the entire collection of matrices
Token Generation Requires Full Model Matrices in Memory
By
–
Global AI News Aggregator
By
–
Not for running models, you need the whole thing in memory because every token that's generated includes calculations run against against the entire collection of matrices
Leave a Reply