@AnthropicAI Are there any public benchmarks I can run when developing a system prompt for Claude Code? I'm an ML person so it's fine if it's a repo with a bunch of steps etc. It's hard to know what's working just by feel, because tasks are always different.
Request for Public Benchmarks to Evaluate Claude Code System Prompts
By
–
Leave a Reply