“Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences” This paper views linear recurrences through a least-squares/test-time regression lens, and adds the missing curvature information via preconditioning. Main idea: precondition the delta-rule
Preconditioned DeltaNet Adds Curvature-Aware Linear Recurrence Modeling
By
–
