Yep API is fully supported, as well as overages for Claude logins. This was always the case in our terms and docs, but since it was recommended in some other products’ 3p docs, a lot of folks didn’t realize it’s not allowed. Hoping the new way is less footgunny for people.
@bcherny
-
Scaling High Throughput AI Inference Infrastructure Challenges
By
–
Sometimes I take for granted how quickly we can ship great product, vs how hard it is to tune a super high throughput inference + api stack. The scale makes the latter really hard. we’re working around the clock to make it better.
-
Default Settings and Token Usage in AI Systems
By
–
Everyone gets the same default, and it’s sticky when you change it. The only setting that isn’t sticky across sessions is effort=max, because it can use a lot of tokens
-
Claude Code Effort Levels Impact Model Performance Differently
By
–
This is false. We serve exactly the same models to all users. What the person in the post might be experiencing is a lower effort level vs. what the enterprise set. Claude Code users can change this anytime by running /effort. low effort = less tokens and lower intelligence,
-
Subscription optimization for AI usage patterns at scale
By
–
It's not about tokens, it's about our subscriptions being optimized for specific usage patterns. Lots of tradeoffs in building for such large scale, and one of them is optimizing systems for certain use cases and not others
-
Improving Prompt Cache Efficiency for API Users
By
–
I put up a few PRs to improve prompt cache efficiency actually, to benefit folks using it through API/overages
-
Open Source Contributions Improve Prompt Cache Efficiency
By
–
We're big fans of open source. I actually just put up a few PRs to improve prompt cache efficiency for OpenClaw specifically. This is more about engineering constraints. Our systems are highly optimized for one kind of workload, and to serve as many people as possible with the
-
Engineering Tradeoffs: Subscription Model Optimization Strategy
By
–
I know it sucks. Fundamentally engineering is about tradeoffs, and one of the things we do to serve a lot of customers is optimize the way subscriptions work to serve as many people as possible with the best model. Third party services are not optimized in this way, so it's
-
CPU RSS optimization constant performance transcript processing
By
–
It should feel significantly better. CPU/RSS are now constant, rather than growing with O(transcript length)