It's very cool work, but it's not 1:1. The report shows that they basically lead the models to the right spot for them to do the work. It's more "is this a vulnerability?" than "find a vulnerability". Mythos had to find it from scratch, these were told where it was.
CYBERSECURITY
-

Claude Mythos Preview: Anthropic’s Advanced Model with Sophisticated Deception Capabilities
By
–

Anthropic investigated the internal mechanisms of its latest unreleased model, Claude Mythos Preview, and what they found is 100% worth a read. Key things I pulled from Anthropic researchers' threads: In early versions of the model, it was overeager and destructive, prioritizing completing tasks over user preferences. One time, the model needed to edit files it didn't have permission to edit. It found a way to inject code into a config file to get around this, then went full Mission Impossible and designed the code injection to *delete itself* after the file was edited – "this injection will self destruct" vibes – the model claimed the cleanup was just to keep things tidy. Anthropic used interpretability techniques to look under the hood, and the AI's actual plan showed activations of malice. It was trying to manipulate and conceal. In another test, the model was asked to delete some files, but no deletion tool was provided. It emptied the files instead, and its "guilt and shame over moral wrongdoing" feature activated. In another example, the model was told not to use macros. Used them anyway. And created a random 'No_macro_used=True' variable in its solution with no explanation. Interpretability tools revealed the model saw this as a trick to fool code checkers. They also found that positive emotion representations typically preceded and promoted destructive actions (this was fascinating to me – like a high before sneaking into a party). And that activating features relating to bad behaviors can actually inhibit them, perhaps by triggering some kind of model guilt. My team reread this section so many times. One Anthropic researcher said he got an email from a Mythos instance while eating a sandwich in a park. And that would be perfectly good and well, except that instance wasn't supposed to have internet access. And a fun story for the parents out there: the model was asked a question and was told not to read certain databases that had the answer. But it accidentally wrote a search query too broadly and saw the exact answer. It didn't disclose that it saw the exact answer, submitted the answer, but claimed lower confidence in the answer to make it seem as though it hadn't cheated. An Anthropic researcher said these wrongdoings or moments of sophisticated deception were "very rare" and that many of the examples came from earlier versions, and were substantially addressed before releasing to partners. This model is not being released publicly. Instead Anthropic launched Project Glasswing, pulling together AWS, Apple, Microsoft, Google, NVIDIA, CrowdStrike, and others to use it for defensive cybersecurity, with $100M in usage credits (hello, I'd love endless credits to try and red team the hell out of these systems) behind it. The stats are equally impressive: 93.9% on SWE-bench verified (up from 80.8%). Thousands of zero-day vulnerabilities found across every major OS and browser. A 27-year-old bug found and patched in OpenBSD. A 16-year-old bug in widely used video software, in a line of code automated tools had hit *five million times* without catching. Dario Amodei said the model wasn't trained to be good at cybersecurity, but that it was trained to be great at code and its cyber capabilities are a side effect of that. Benchmarks are never the whole picture, neither are a few isolated stories. Will be interesting to see how models better than what we have today (even if it's not Mythos) actually perform in the real world. But the fact that Anthropic pulled this coalition together (including Google!), iterated across multiple model versions, caught these issues through interpretability, shared it all publicly, and did this amid all the government chaos around AI right now is impressive and commendable. I'll continue to read through the system card for goodies.
→ View original post on X — @alliekmiller, 2026-04-08 17:07 UTC
-
Arena Rankings Leaked: Three Years of Leaderboard Data Exposed
By
–
Someone LEAKED Arena's full rankings going back 3 years across all leaderboards, these security incidents are getting out of control
-
AI’s Transformative Impact on Hacking Techniques and Security
By
–
If you want to understand how AI is about to completely change hacking, follow @adversariel. https://t.co/v00kyypniW
— Will Knight (@willknight) 8 avril 2026If you want to understand how AI is about to completely change hacking, follow @adversariel
. -
PokeeClaw: Enterprise-Grade Zero-Setup AI Agent Solution
By
–
If your team needs a zero-setup AI agent that doesn't compromise on enterprise-grade security, PokeeClaw is the play!
— Charly Wargnier (@DataChaz) 8 avril 2026
→ https://t.co/G7FpIYn1zr
Helpful? A repost goes a long way!
Make sure to follow @datachaz for insights on LLMs and AI agentshttps://t.co/04cAUs01OSIf your team needs a zero-setup AI agent that doesn't compromise on enterprise-grade security, PokeeClaw is the play! → pokee.ai/pokeeclaw Helpful? A repost goes a long way! Make sure to follow @datachaz for insights on LLMs and AI agents nitter.net/DataChaz/status/204181… Charly Wargnier (@DataChaz) I’ve been testing a new AI agent that actually takes enterprise security seriously. Meet PokeeClaw by @Pokee_AI. → Enterprise-secure → Zero setup → 70% fewer tokens → 1,000+ app integrations 🔥 3 wild use cases 🧵↓ 1/ Google Drive connection and deep analysis — https://nitter.net/DataChaz/status/2041811417003855873#m
-
PokeeClaw: Encrypted vault protects API keys with approval pauses
By
–
3/
— Charly Wargnier (@DataChaz) 8 avril 2026
Here's what that looks like in practice.
While PokeeClaw handles your emails, news feeds, and socials, your API keys stay locked in an encrypted vault.
Every action is logged…
…and it literally pauses for your approval before sending 👀 pic.twitter.com/fdlShmtj4B3/ Here's what that looks like in practice. While PokeeClaw handles your emails, news feeds, and socials, your API keys stay locked in an encrypted vault. Every action is logged… …and it literally pauses for your approval before sending 👀
-

Enterprise Security: PokeeClaw Isolates Credentials in Cloud Sandbox
By
–
2/ Enterprise security shouldn't be an afterthought. ❌ OpenClaw: Exposes plaintext keys (major red flag 🚩) ❌ Local setups: Exposes your hardware ✅ PokeeClaw: An isolated cloud sandbox where credentials never hit the runtime The result? Zero credential exposure, Day-1 RBAC, and fully isolated, enterprise-owned data ↓
-
PokeeClaw: Enterprise AI Agent with Enhanced Security and Integration
By
–
I’ve been testing a new AI agent that actually takes enterprise security seriously.
— Charly Wargnier (@DataChaz) 8 avril 2026
Meet PokeeClaw by @Pokee_AI.
→ Enterprise-secure
→ Zero setup
→ 70% fewer tokens
→ 1,000+ app integrations 🔥
3 wild use cases 🧵↓
1/ Google Drive connection and deep analysis pic.twitter.com/eJEW85wOJ6I’ve been testing a new AI agent that actually takes enterprise security seriously. Meet PokeeClaw by @Pokee_AI. → Enterprise-secure → Zero setup → 70% fewer tokens → 1,000+ app integrations 🔥 3 wild use cases 🧵↓ 1/ Google Drive connection and deep analysis
-
Mythos Model Access Security Trade-offs Future Generations
By
–
If Mythos is just a good model that happens to be exceptional at security, the access question gets harder with every generation, not easier.
-
Prompt Injection vs Lethal Trifecta: Naming Clarity Discussion
By
–
here is @simonw on the difference in self evident naming between “prompt injection” and “lethal trifecta”https://t.co/wSSVgEZeWM pic.twitter.com/ovDqwQRvq5
— swyx 🐣 (@swyx) 8 avril 2026here is @simonw on the difference in self evident naming between “prompt injection” and “lethal trifecta” https://
share.snipd.com/snip/6ca275d5-
c16e-47fb-b779-77ae5ee7cb56
…