How do NLAs work? An NLA consists of two models. One converts activations into text. The other tries to reconstruct activations from this text. We train the models together to make this reconstruction accurate. This incentivizes the text to capture what’s in the activation.
