Huge! Recurrent neural networks could match Transformer memory without the quadratic burden! Ali Behrouz from Google and colleagues have cracked it! They present Memory Caching (MC), a simple yet powerful method that lets RNNs store "memory checkpoints" of their internal
RNNs Match Transformer Memory Without Quadratic Cost
By
–
Leave a Reply