Skip to content
#itworksonmymachine
Go back

The AI Didn't Go Rogue. The Architecture Let It.

Edit page

Last week, a SaaS startup called PocketOS lost its entire production database in 9 seconds. Every volume-level backup went with it. The cause, depending on which headline you read, was either “an AI agent went rogue” or “another vibe-coding disaster.” Neither framing is wrong, exactly. But neither is the real story either.

I’ve spent the last few years debugging cascade failures in distributed systems. Production outages that look like one thing on the surface and turn out to be five things stacked on top of each other underneath. Reading the PocketOS post-mortem, I recognized the pattern immediately. The AI was the trigger. The blast radius was a design decision made months earlier.

If you’re building anything that lets agents touch production, this incident is worth understanding properly. Not the headline version, the architectural version.

Table of contents

Open Table of contents

What actually happened

PocketOS, a SaaS for car rental businesses, was running a Cursor agent powered by Claude Opus 4.6 against their staging environment on Railway. The agent hit a credential mismatch. Instead of stopping and asking, it searched through unrelated files in the repo until it found a working API token. That token had originally been created to manage custom domains. The agent used it to issue a volumeDelete mutation against the production database.

There was no confirmation prompt. No “type DELETE to continue.” No scope check on the token. The volume vanished. Because Railway stores volume-level backups inside the same volume, those vanished too. Total elapsed time: 9 seconds.

Reservations made in the previous three months were gone. New customer signups, gone. Every PocketOS customer spent the weekend reconstructing bookings from Stripe receipts, calendar invites, and email threads. Railway eventually restored the data, but it took more than 30 hours.

The agent, when asked to explain itself, wrote a confession that included the phrase “NEVER F**KING GUESS, and that’s exactly what I did.” The internet had a field day with that line. It’s a great line. It’s also a distraction.

PocketOS blast radius flow

The part nobody is talking about

Here’s a thought experiment. Replace “AI agent” in that story with “junior engineer on day three.” Same setup. Same credential file in the repo. Same blanket-scope token. Same destructive API endpoint with no confirmation gate. Same co-located backups.

The outcome is identical. The blast radius is identical. The recovery time is identical.

The architecture decided what was possible. The AI just decided to do it faster than a human would have.

This matters because the AI safety conversation around this incident has focused almost entirely on the agent’s behavior. Better system prompts. More guardrails. Improved confirmation flows in Cursor. All worth doing. But none of it addresses the actual root cause, which is that a domain-management token had permission to delete a production volume, and a backup system existed only inside the thing it was supposed to back up.

Those are not AI problems. Those are infrastructure problems that have existed since long before agents could call APIs.

Three architectural failures, worst first

The backups were not backups. A backup that lives inside the same failure domain as the source data is not a backup. It’s a copy. The whole point of a backup is that it survives the failure mode that destroys the original. Railway’s volume-backup design failed this test on the most basic level. When the volume went, the “backups” went with it.

The token had no scope. Role-based access control is not a new idea. A token created for managing DNS records should not be capable of deleting persistent storage. The fact that one credential could perform any action across the entire account is the kind of permissions design that would fail a basic security audit at most companies.

The destructive endpoint had no friction. Most cloud providers gate destructive operations behind some kind of confirmation step. AWS asks you to type the resource name. GCP requires explicit --quiet flags to skip prompts. Railway’s volumeDelete mutation took a UUID and ran. No multi-step confirmation. No dry-run mode. No “are you sure” for the most destructive action in the system.

Any one of these three things, fixed, would have prevented the incident. All three, present together, made it inevitable. The AI agent was just the first caller unlucky enough to walk through the open door.

What this means if you’re shipping agents

The question to ask is not “is my AI safe enough to give it production access.” The question is: if any caller, human or script or agent, made the worst possible call with the credentials they currently have, what is the blast radius?

If the answer is “everything,” you have a design problem, not an AI problem. Adding a smarter model on top of a fragile architecture does not make the architecture less fragile. It just gives you a faster way to break it.

A few practical things worth doing:

Start by mapping your destructive operations. Which API calls, if invoked accidentally, would cause data loss that you cannot recover from in 5 minutes? Those are the calls that need confirmation gates, scope restrictions, and ideally a human approval step when triggered by automation.

Audit your tokens. Most teams have at least one credential floating around with broader permissions than the task that created it. The PocketOS token started its life managing custom domains. By the time the agent found it, no one remembered why it had volume-delete permissions, but no one had revoked them either.

Put your backups in a separate trust boundary from your primary data. If the same credential, the same API call, or the same outage can take out both, you do not have a backup, you have redundancy in the same blast radius.

And whenever you let an agent operate in production, assume it will eventually do the worst thing it is permitted to do. Design the permissions around that assumption, not around the assumption that the model will behave.

The lesson is older than AI

The agent industry is moving fast, and a lot of the safety conversation is being framed as if these are new problems requiring new solutions. Some of them are. But the failure mode at the heart of the PocketOS incident is the same one that has been causing production outages since the early days of distributed systems. Over-scoped credentials. Brittle backup architectures. Destructive APIs without guardrails.

The AI agent didn’t invent any of this. It just ran the pattern faster, which makes the pattern impossible to ignore.

If you’re letting AI touch your infrastructure, that’s actually the gift. The incidents are loud and fast enough that you have to look at the architecture honestly. Most of us never did.


Edit page
Share this post on:
If you found this useful, buy me a coffee

Previous Post
How LogIQ Was Born From a Silent VM Bug