Safe defaults

Agents are fast. That’s good when they do something right and bad when they do something wrong — errors scale too. The Prysm:ID stack limits blast radius in three layers: authentication scoped to the human, human confirmation on destructive operations, and strict REST API parity (no MCP “special modes” the API itself wouldn’t allow).

Layer 1 — Authentication: the agent acts as you

The MCP server (@prysmid/mcp) authenticates via device flow (RFC 8628). You sign in once on auth.prysmid.com and the server caches an access_token (~12 hours) + refresh_token (~30 days) at ~/.config/prysmid-mcp/token.json.

Consequences:

The agent can only operate workspaces you have access to. It’s not a universal “service key” — it’s your credentials delegated via OAuth, so the server-side applies the same validations the dashboard does.
Every workspace endpoint validates membership. Even if the agent guesses someone else’s workspace slug, the handler returns 404 (not 403, to avoid leaking existence) — same treatment as the dashboard.
You can revoke any time. Sign out from Prysm:ID or delete token.json and the agent loses auth.
The audit log tags every action with actor=user:<your-email>. There’s no separate bot actor by design — you know what happened because the tool did it on your behalf.

Layer 2 — Human confirmation on destructive ops

The official handoff prompts (Claude Code ES/EN, Antigravity ES/EN) explicitly instruct the agent:

For destructive actions (delete_workspace, delete_oidc_app, delete_idp), ask EXPLICIT confirmation before each call — I’d lose users + apps + IdPs irreversibly.

The agent doesn’t auto-confirm. If you say “no” or “wait”, the agent stops.

Operations the prompt marks as destructive

Tool	Why
`delete_workspace`	Irreversible. Drops the entire Zitadel instance + DNS + SMTP.
`delete_oidc_app`	Breaks any active logins of that app.
`delete_idp`	If it was the only IdP and password+register was off, you’d leave the workspace without a sign-in method.
`delete_user`	Loses sessions, historical attribution, etc.
`set_spending_cap` when it reduces below current usage	May cut billing abruptly.

Operations that DON’T require confirmation (and why)

Action	Why
Any reversible creation (`create_workspace`, `create_oidc_app`, `add_idp`)	If the agent creates one too many, you delete it. Low cost.
Reads (`list_`, `get_`)	Obvious.
`update_branding`	Changing logo / colors is non-destructive; reversible in one call.
`update_login_policy`	Curated tools auto-promote the per-org override without touching other orgs; changes are reversible via PATCH.
`set_spending_cap` upward	Paying more never broke a customer (cost: leaves overage cap higher, reversible).

Layer 3 — Strict REST API parity

The MCP server has no “special modes.” Every MCP tool ultimately makes the same HTTP call the dashboard would to api.prysmid.com. This means:

There’s no MCP endpoint the browser couldn’t hit. If the dashboard won’t let you delete X without a modal confirmation, the agent can’t silently do it either — the handler 4xx’s with the same validation.
Rate limits are the same. A runaway agent trying to create 1000 workspaces hits the same cap on the REST API, not a special MCP cap.
There’s a single audit log. Both dashboard and agent activity appear in the same feed with the same actor=user:<email>.

Auditing what your agent did

In app.prysmid.com → audit (when the panel ships — today via API), filter by:

Your email as actor → see everything that happened “as you”, including agent actions.
The agent’s session time window → all agent activity gets grouped.

There’s no separate via_mcp flag in the audit log because the agent is you via device flow. If you need to distinguish, mark a sandbox workspace vs your prod one, and only let agents work in the sandbox until you trust them.

Roadmap

Approval queues: for teams where an agent proposes changes and a different human approves (not the one talking to the agent). In design.
Customer-facing service accounts: machine keys with scoped permissions, explicit expiration, audit actor=key:<id>. Pending.
Fine-grained per-tool scopes: today the access_token has a broad scope (everything your account can see). We’re debating whether a read-only scope for side-by-side agents makes sense.

Philosophy

The agent is a first-class user. But “first class” doesn’t mean “no guardrails”. It means the same respect and intelligent distrust you’d apply to a new colleague with prod access.

The gates aren’t there to slow you down — they’re there so when your agent hallucinates one day, the blast radius is recoverable, not terminal.