7/ The Hydra Effect – shows that language models exhibit self-repairing properties — when one layer of attention heads is ablated it causes another later layer to take over its function.
The Hydra Effect: Self-Repairing Properties in Language Models
By
–
