Spent a long time genuinely believing that if a company just threw…

Created from a single voice note with Agent Craft

X (Twitter)

Spent a long time genuinely believing that if a company just threw enough engineering talent at the safety layer of a model, you'd eventually get something that held. Turns out that belief didn't survive contact with the real world. Anthropic's latest model had its guardrails jailbroken within a single day of public release. Not weeks. Not months. Hours. The US government restricted access the same day. Let that sit for a second. Anthropic is not a careless team. They are arguably the best safety-focused organization in this space, and they couldn't hold the line for 24 hours. So if you're still anchored to the idea that model-level safety guardrails are the primary defense against misuse, this should shift something for you. Here's what I actually think is going on. The adversarial gap now closes at a speed that makes model-level defense structurally untenable. You can build the most sophisticated restrictions in the world and a motivated community will route around them, because the capability of the systems has outpaced the ability to constrain them at the model layer. That's not a knock on any particular team. It's just physics. Most people are reading this as a safety story. I think that's the less interesting part of it. The bigger thing is that government intervention is no longer a future debate you get to prepare for. It's happening in real time. We went from model release to active restriction in under 24 hours. That's the new pace. Regulatory reaction that used to take years is now happening in the same news cycle as the product launch. The rules of this space are being written right now, not in committee rooms in 18 months. The bits and pieces come together into something uncomfortable: if even the best safety team can't hold the line for a day, and governments are responding same-day, then the people building on top of these systems need to be paying much closer attention to where the guardrails actually sit in their own products, not delegating that responsibility upward to the foundation model and assuming it'll hold. This might be a rather controversial take, but I don't think safety at the model level was ever going to scale with capability growth. It's essentially a losing position the moment the model gets good enough. What you're left with is application-layer responsibility. Which is messier and harder to market, but it's where the real work is. If you're building on any of these models right now, I'd genuinely like to know how you're thinking about this. Not the abstract policy angle. The practical one: what does your safety posture actually look like at the product level? Drop it in the replies.

James GoddardJun 23, 2026Published to X — @JamesGodda75737View original ↗

Spent a long time genuinely believing that if a company just threw…

More content from Agent Craft