Sandboxes Won't Save You From OpenClaw

AI agent misbehavior isn't a sandbox problem—it's a permissions problem.

Aakash Japi·February 24, 2026·5 min read·See the discussion on Hacker News

In 2026, so far, OpenClaw has deleted a user's inbox, spent 450k in crypto, installed uncountable amounts of malware, and attempted to blackmail an OSS maintainer. And it's only been two months.

The (tech-adjacent) world is responding. Paranoia about misaligned AI is going semi-mainstream. X and LinkedIn are awash in prompt injection stories and not-so-subtle company-adverts disguised as warnings. Suddenly, arguments about rogue intelligence aren't dismissed with an eye-roll. Suddenly, people see agents burning someone's crypto or deleting their email inbox and they're looking for solutions.

And if you read enough, it seems like they've found one: sandboxes.

Sandboxes are nothing new. They're just an application of virtualization, and virtualization is ancient by software standards. IBM launched it for mainframes back in the late-1960s, and despite massive change in the underlying tech, the core objective is the same: sandboxes isolate workloads from each other while providing each workload a full machine abstraction.

Today, the trending "workload" is an AI agent. The thinking goes, if we run the agent in a sandbox, and the sandbox doesn't "leak," then the agent can't delete my files, read my cryptocurrency wallet, or clear my inbox, and so, I am safe.

Except of course, you aren't. You probably noticed that of the agentic misbehavior I mentioned above, none of them involved filesystem access. Instead, every major issue involved a third-party service, and in each case, the user explicitly granted the agent access to that service. The agent instead was prompt injected or misinterpreted its own instructions, then did something unexpected, and there wasn't anything blocking it from doing so.

There isn't a sandbox in the world that prevents this. Sandboxes are useful for isolating between workloads, but agents primarily need to be isolated from you. The only thing the sandbox gives you here is filesystem protections, which keep the agent from rm -rf'ing your root, and network protections, which limit which websites the agent can access. This is definitely useful. But it's not at all sufficient for safety.

The underlying issue is that there's a tension between the usefulness of a general-purpose agent like OpenClaw and the restrictions that a secure deployment would necessitate. For example:

You obviously shouldn't give it access to your accounts. But an agent running its own account can't handle my calendar or respond to my emails, and that's what I want it to do.
Similarly, you shouldn't give OpenClaw access to money. But I want an agent that takes photos of my pantry, sees what I'm running low on, and orders new groceries for me, and that requires my credit card.

And so on, ad infinitum. People see OpenClaw as an early iteration of a real-life Jarvis, the personal assistant from Iron Man that ran most of Tony Stark's life. They want it to book flights for them and negotiate their rent and handle their auto-insurance claims, and in terms of capability, it can. We just can't prevent it from being hijacked.

The product this market demands isn't a sandbox, it's some form of agentic permissions. What you want is to grant an agent a limited degree of latitude in each account. I want to connect my credit card, but only let the agent spend < $30 a day, and only on Amazon Fresh. I want to connect my email, but only allow sending or replying to a few specific addresses, and every message needs my approval before sending.

The closest we have to this right now is OAuth, which is designed for humans. The permissions it offers are far too coarse. Gmail, for example, has "send emails," as a single permission grant. Github has "make pull requests" as another. Payments have basically nothing. We rely on the goodwill (and the desire to not be criminally prosecuted) of e-commerce platforms.

For agents, you need to specify these with much more granularity.

What do you actually need? Let's go back to the examples above:

For Gmail, the integration flow should involve someone walking through their contacts and pre-approving each with permissions (send without approval, require approval). For the latter category, messages should sit in a queue until the user manually approves them, which then calls back to the agent.
For credit card limits, the purchase API should be entirely different. The agent should never see the actual card number. Instead, it could request a new credit card number for each purchase, which should only approve transactions of a specific size from a specific seller, and every request for a number should go through the user. This means the agent doesn't even have a credit card number to leak, and can't reuse a prior approval for subsequent actions.

You can extend this idea to every single product we want to connect to an agent. The point is clear: we need to design new interfaces for agents because agents are a fundamentally new type of actor.

It's obvious why this doesn't exist yet. I hear the objections in my head already. Every surface has a different permissions model and different assets to secure, and because of this, it's very hard to build middleware that enforces this across products. You either need every product to build this itself, or for different industry consortiums to create and enforce a standard across themselves. I think what the moment demands is the next Plaid, which wrangles a bunch of disparate operators into a single, unified API. And like Plaid, I do think the first place this happens is in finance: there's just too much money on offer.

But one thing is clear: we definitely do not need yet another agent sandbox. Wrap OpenClaw in Seatbelt, bubblewrap, or landlock, and move on. It's not enough, but neither is anything else.

Tachyon

If you're building an agent in today's guardrail-free world, then reach out to us at Tachyon to audit it for vulnerabilities.

Back to Blog