AI Dynamics

Global AI News Aggregator

About

RNN-based LLMs and information retention patterns

This is quite interesting … 1) I would expect that the opposite is true for, e.g., RNN-based LLMs like RWKV (since it's processing information sequentially, it might rather forget early information) 3/5

→ View original post on X — @rasbt