SD² systematically enhances draft token acceptance rates while significantly reducing Multiply-Accumulate operations (MACs), even in the Universal Assisted Generation (UAG) setting, where draft and target models originate from different model families.
SD² Enhances Draft Token Acceptance Reducing MACs
By
–
