IMO @AnthropicAI is very close to making a breakthrough in productizable interpretability. For ~4 years all we've had to really control LLMs is temperature/top_p and logit bias. We recently got `seed` and constrained structured output, with `interactive=false` on the way. But
Anthropic’s Progress Toward Productizable AI Interpretability
By
–
Leave a Reply