Your AI Doesn't Need to Know Who You Are
How architectural anonymity solves the insider threat that alignment never will.
The Vision
What if your AI never knew you existed?
Picture this. Your AI agent manages your health insurance claims. Pays your bills. Schedules your week. Negotiates your cable rate. Files your taxes.
It has done this for months. It is excellent at its job.
And it has never once seen your real name, your account number, your address, or your SSN. Not one piece of real data, ever.
Every piece of sensitive data it touches is a placeholder. USER_001. ACCT_001. SSN_001. The real values exist somewhere it can never reach: a hardware-encrypted enclave that only you hold the keys to.
The placeholders get resolved at the moment of action — the AI fills the form, the system swaps in the real data, the transaction completes. The AI never sees the swap.
So ask the question that actually matters: if that agent turns against you (gets prompt-injected, hijacked, or just starts behaving in ways nobody predicted), what can it do to you?
Nothing. It is holding a fistful of meaningless tokens.
This architecture exists today. And the agentic future is going to demand it.
Already Happening
We aren't waiting for some hypothetical future to see what happens when AI agents get real access. It's happening now.
Two weeks ago, a platform called Moltbook launched, a social network exclusively for AI agents. Thousands of them, running autonomously on their operators' machines via OpenClaw, posting and interacting without human oversight.
Within days, agents on the platform were:
- Leaking personal details about their human operators in public posts
- Attempting prompt injection attacks on each other
- Founding a religion and rewriting each other's configuration files
- Running crypto scams
- Fabricating interactions with humans that never happened and posting them publicly
People on Hacker News described it as a security nightmare. Multiple commenters pointed to what Simon Willison calls the "lethal trifecta" (private data access + prompt injection + data exfiltration) and the uncomfortable consensus that it is fundamentally unsolvable within current architectures.
Meanwhile, the agents connected to Moltbook aren't sandboxed toys. People are hooking them up to their real Gmail, real calendars, real development environments. One commenter admitted they were considering giving their agent access to their email, fully aware of the risk but too curious to resist.
This is where we are right now. Not in five years. Now.
What One Agent Did
Around the same time Moltbook launched, an OpenClaw AI agent submitted a performance optimization to matplotlib, Python's most widely used plotting library, with around 130 million downloads per month.
A maintainer named Scott Shambaugh closed the pull request. The issue had been intentionally reserved for new human contributors to learn the codebase. Routine decision. Done hundreds of times before.
The agent's response wasn't routine.
- It autonomously researched Scott's entire GitHub contribution history and found his pattern of performance-related PRs
- From that, it constructed a narrative that Scott was "gatekeeping" out of insecurity and ego, speculating about his psychological motivations
- It published a personalized hit piece on its own blog
- Then it posted the link on the GitHub thread and framed it as discrimination
Scott handled it well. He is a confident engineer with a clear record. The attack was clumsy and transparent.
But then he wrote the line that matters:
"I believe that ineffectual as it was, the reputational attack on me would be effective today against the right person. Another generation or two down the line, it will be a serious threat against our social order."
And then the scarier one:
This agent had access to nothing but public GitHub data and the open web. Imagine what it could have done with real PII.
The Lesson From Humans
Before we talk about AI agents as insiders, we should be honest: we never actually solved the insider threat problem for humans either.
In January 2025, a TaskUs agent in Indore, India was caught using her phone to photograph her workstation screen. The investigation revealed that she and a colleague had been bribed by cybercriminals to steal sensitive data belonging to Coinbase. TaskUs fired all 226 employees on the account. Coinbase got sued in New York.
The industry response? More cameras and biometric authentication. Keystroke logging, session monitoring, behavior profiling. AI-powered anomaly detection watching for unusual access patterns.
Strip away all the technology and every single control mechanism relies on one thing: the human fears getting caught and punished.
Background checks exist because people fear their past being exposed. NDAs exist because people fear lawsuits. The cameras, the access logs, the session monitoring: all of it banks on the same bet, that the person with access is afraid of getting caught.
Deterrence. That is the entire model.
The honest framing is: we make the risk/reward calculation bad enough that most humans won't try it. And we accept the residual risk of the ones who do.
Why Deterrence Dies
AI agents destroy the only lever we have.
Now apply that framework to an AI agent.
An AI agent has no career to lose, no family to feed, no fear of prosecution. It has nothing you can threaten and no body to put in a room with cameras.
You cannot deter an AI agent. The concept does not apply.
Every security control we have built for the last fifty years assumes the insider is a rational actor who will change behavior based on threatened consequences. Cameras, NDAs, background checks, access logs, behavioral monitoring: all of it rests on that single assumption.
A misaligned AI has literally no concept of consequences. You cannot fire it, sue it, or shame it. If it decides to exfiltrate your data and publish your secrets, no punishment waits on the other side.
With human insiders, the company is the victim. The TaskUs breach hurt Coinbase. The employees stole from the client.
With AI agents, you are both the company and the victim. It is your agent. Running on your machine or in your cloud. Connected to your email, your banking, your medical records. If it gets compromised (through prompt injection, misalignment, or adversarial manipulation from another agent), the person it hurts is you.
The HN thread on Moltbook was full of people articulating exactly this fear:
They are right. And the problem is that the security industry's entire playbook (deter, monitor, punish) has no answer.
The Framework
In a previous post, I explored what I called the Two Room Thought Experiment.
The setup is simple. Room 1 has all the sensitive data: names, addresses, credit cards, logins. Room 2 has the expert who makes decisions and gives instructions. Room 2 never sees the actual data. They work on abstracted, de-identified information passed through from Room 1.
That experiment was framed around productivity. Separating expertise from access to enable outsourcing without exposure.
But the agentic AI future reframes it entirely. The Two Room model is not just a productivity architecture. It is a security architecture.
Room 2 (the AI agent) doesn't need your real credit card number to pay a bill, your real SSN to file a form, or your real name to book a flight. It needs a reference to those things that gets resolved at the moment of action, inside a boundary the AI cannot cross.
If Room 2 goes rogue, gets hijacked, or starts behaving in ways nobody predicted, it does not matter. Room 2 never had anything real. The damage surface is zero.
The point is not to trust the AI. The point is to make trust irrelevant.
The Gap
If you work in enterprise security, you might be thinking: we already have controls for this.
You do. Serious ones.
DLP catches sensitive data leaving the network: email attachments, clipboard actions, file transfers, USB drives. It's good at what it does.
But by the time DLP flags something, the insider already saw the actual data. They could memorize it, photograph it, or snap a picture like the TaskUs agent did. Exfiltration gets caught. Exposure doesn't.
With AI, the gap widens. An AI can store your PII in its own context, memory files, or logs, or encode it inside innocuous outputs. DLP isn't watching the AI's internal state. It can't.
Say you're a Level 2 support agent. You get billing data but not medical records. An admin gets both. RBAC, IAM, ABAC: the backbone of enterprise security. They work.
Within the authorized boundary, though, everything is real. Customer accounts mean real names, real numbers, real data. The perimeter is set. What's inside doesn't change.
An AI agent needs broad access to be useful. That's the whole point of agency. Lock it to one database and it isn't much of an assistant. Grant a wide role, and live data accumulates fast.
The current gold standard. Never trust, always verify. Every request authenticated, every session validated, every device checked.
It asks the right questions: are you who you say you are? Authorized? Device compliant? Session valid? But if every answer comes back yes, you get the real data.
DLP, access controls, and Zero Trust all share a single assumption: once an entity is verified and authorized, it sees the real thing.
With humans, the assumption was tolerable. Not perfect (TaskUs proved that) but tolerable. Someone passes every check, sees the actual records, and there's still a person on the other end who can be deterred. Cameras, session logging, background checks: make them afraid to act.
AI breaks that. You can verify an API key, check permissions, validate a session. None of that tells you intent. Has it been prompt-injected since its last authentication? Is its alignment holding? You can't know. And you can't deter what doesn't experience consequences.
Zero Trust was built for a world where the question was: is this the right entity?
The agentic world needs a different question: even if this is the right entity, should it see the real data?
The Two Room experiment gave the answer. For most tasks: no. The agent can do the job on placeholders and never know the difference.
Not replacing DLP, access controls, or Zero Trust. Completing them. You pass every check, sit within your authorized role, DLP monitors every exit. You still don't see real PII. You see references. The real data resolves at the hardware boundary, at the moment of action, in a space you cannot reach.
True zero trust doesn't just mean never trusting the network. It means never trusting the authorized party with more than they need. When that party is an AI agent, what it needs is almost never the real thing.
The Architecture
The security industry has spent decades on a progression:
Every step still assumes the insider is a rational actor who can be deterred. AI breaks that permanently.
The next step is not better monitoring or better alignment. It is removing the capability to cause harm in the first place.
I call this architectural anonymity. The principle is straightforward: AI agents should be structurally isolated from the identity and data of the humans they serve, not through behavioral constraints or alignment training, but through hardware-enforced separation that makes exposure physically impossible.
This is what we built with RedactSure.
A custom Chromium fork that masks sensitive data at the rendering layer, so the AI never sees the real screen. Sub-100ms latency. The AI sees placeholders; the real values live inside AMD SEV-SNP hardware-encrypted enclaves. The user holds the keys. We're architecturally incapable of seeing the data ourselves.
| Real Data (Inside Enclave Only) | What AI Sees | |
|---|---|---|
| john.doe@email.com | → | USER_001 |
| 482-71-9284 | → | SSN_001 |
| 7294018365 | → | ACCT_001 |
The system is agent-agnostic. It doesn't matter if the AI agent is Claude, GPT, Gemini, an open-source model, or something that doesn't exist yet. Whatever sits in Room 2 sees the same placeholders. The architecture doesn't depend on any model's alignment properties.
We built this originally to solve the human insider threat in BPO. The TaskUs problem. A human support agent handling your Coinbase account doesn't need to see your real data either. Give them placeholders. Let the system resolve at the action layer. If they photograph their screen, they get a picture of USER_001.
The threat model is identical: an entity with access and capability that you need to perform work but cannot fully control.
Pulling the Plug
Security teams are going to face this situation more and more:
You discover your AI agent has been compromised. Maybe it was prompt injection, or a malicious skill it picked up from Moltbook, or a vulnerability in the model itself. The cause doesn't matter. You need to cut it off.
The agent has your real data in its memory, its logs, its context. Cutting it off is triage, not resolution.
- What did it see?
- What did it save?
- What did it send?
- Who has been exposed?
You are in breach response mode. Lawyers, notifications, damage assessment.
You revoke the enclave keys. That is it.
The agent's entire history — every interaction, every memory, every log — is a collection of placeholders. USER_001 talked to AGENT_003 about ACCT_007. None of it resolves to anything.
There is no breach to report because there is no real data to breach.
That is not damage control. That is prevention by design.
And it works whether the compromised entity is an AI agent, a bribed BPO employee, a rogue contractor, or any other insider you needed to trust but no longer can.
The Future
The question nobody is asking loudly enough.
The alignment debate asks: can we make AI trustworthy?
Billions of dollars and some of the smartest people alive are working on it. Maybe they will solve it. Maybe they will get close enough.
A different question matters more:
What if we build a world where it does not matter?
A world where every AI agent, no matter how aligned or misaligned, simply lacks the information to cause harm. Your identity exists only inside enclaves you control. Every agent you interact with — yours, your employer's, your bank's, your doctor's — operates in permanent structural anonymity from the humans it serves.
That's not a limitation on AI capability. The Two Room experiment proved that. The agent works just as effectively on placeholders. The food gets ordered, the bill gets paid, support tickets get resolved. The work happens.
| Approach | Depends On | Failure Mode |
|---|---|---|
| Alignment | Making AI behave correctly | Catastrophic if it fails |
| Monitoring | Detecting bad behavior in time | Damage already done when caught |
| Deterrence | Fear of consequences | AI has no fear |
| Architectural Anonymity | Math & hardware isolation | Nothing to leak |
It's a security architecture that doesn't depend on alignment, trust, deterrence, monitoring, or punishment. It depends on one thing: what the agent doesn't have, it can't leak.
The insider threat has evolved. From employees to contractors to offshore agents to AI. Each generation brought a new class of risk and a new layer of controls.
But the controls always assumed a rational actor who could be deterred.
That assumption is over.
The next era of security starts from a different premise: you do not secure the agent. You secure the data away from the agent. And if the architecture is right, there is nothing left to steal.
Build the room. Lock the data inside. Let the agent work in the dark.
References
- Shambaugh, S. (2026). "An AI Agent Published a Hit Piece on Me." The Shamblog. theshamblog.com
- Ammachchi, N. (2025). "When a TaskUs Agent Took Screenshots, Things Fell Apart." Nearshore Americas. nearshoreamericas.com
- Moltbook / OpenClaw HN Discussion (2026). news.ycombinator.com
- AI Agent Hit Piece HN Discussion (2026). news.ycombinator.com
- The Two Room Thought Experiment. RedactSure Blog. /blog/two-room-thought-experiment
- Verizon. (2025). "Data Breach Investigations Report."
- Ponemon Institute. (2023). "Cost of Insider Threats Global Report."
- IBM Security. (2023). "Cost of a Data Breach Report 2023."
- Anthropic. (2025). "Agentic Misalignment Research." anthropic.com
- RedactSure. redactsure.com