You’re optimizing the right things for Unified Memory There are some shortcomings to Speculative Decoding (especially in more creative / less coding workflows) but the pros far outweigh the cons giving the Unified Memory limitations Gonna try this on my Apple hardware tonight
Optimizing Unified Memory with Speculative Decoding on Apple Hardware
By
–