We applied GPT-4 to interpretability — automatically proposing explanations for GPT-2's 300k neurons — and found neurons responding to concepts like similes, “things done correctly,” or expressions of certainty. We aim to use Al to help us understand Al: https://
openai.com/research/langu
age-models-can-explain-neurons-in-language-models
…
GPT-4 Explains GPT-2 Neurons: Interpretability Breakthrough
By
–
Leave a Reply