Agent Architecture: Building Security & Trust in Multi-Agent Systems

Everyone I’ve spoken with about agents asks the same thing: “What about security?”

The concern isn’t just technical, it’s governance. If an agent makes a mistake, who’s accountable? If it accesses data, which policies apply?

In this article, I share the principles I use to answer those questions and align agent security with enterprise trust models. The context is Mezzoic, the product and project management platform I built, but the lessons apply to any organization exploring multi-agent systems.

Challenge: Trust

The biggest question in Security & Trust is accountability: who is responsible for the actions taken by (or on behalf of) the user?

My opinion is clear: accountability lies with the user and the application developers, not the agent. Agents are too new, too naive, to be trusted with unsupervised accountability. They lack mature governance models, and “guardrails” today are more marketing promise than operational reality. There are simply too many loopholes to plug with brittle, complex rules.

That’s why I avoid over-engineered guardrails altogether. Instead, I focus on two principles:

Propagate the user’s context all the way from the agent to the API layer (OAuth OBO).
Require Human-in-the-Loop (HITL) confirmation for any risky action.
Use Rules engines and Workflows to determine if actions are allowed (could be process or risk based).

Guardrails and Human-in-the-Loop

Guardrails ≠ governance. They may provide some protection, but cannot ensure safety, these guardrails aren’t hard constraints, they’re soft because they’re interpreted by the LLM and the LLM is unconstrained and unpredictable. Human review, on the other hand, guarantees accountability where it matters.

Transparency over hidden controls → The agent should show its reasoning (“I plan to delete these 5 records because…”) and ask for confirmation.
Effort vs. impact → Guardrails are costly and brittle; human review is clearer, cheaper, and more adaptable.
Fail gracefully → If the agent isn’t sure, it should escalate — not guess.

👉 Use agents to accelerate safe work, not bypass human judgment in high-risk areas.

Risk Hierarchy (where Human-in-the-Loop is required)

Information loss → Any delete operation (soft or hard) must require explicit confirmation.
Information confusion → Creating or updating critical entities should always prompt user review.
Financial transactions → Any movement of funds requires human approval. (Specialized automated trading systems are the exception, as they rely on dedicated risk controls.)

Challenge: Authentication and Authorization

Where does authorization belong? Not in the agent. It belongs in the MCP servers or your APIs, which already enforce your organization’s policies. That way, agents can’t “make up” permissions — they only operate within the boundaries you already trust.

The MCP specification recommends using OAuth 2.1 for authentication and authorization, particularly when exposing MCP servers. In practice, support for OAuth 2.1 is still maturing across identity providers, so many teams continue to rely on OAuth 2.0 flows (e.g., Authorization Code with PKCE, On-Behalf-Of) to achieve the same goals.

In practice, agents should never have their own ‘superpowers.’ They should always inherit the same identity, rules, and auditability as the human user. In Mezzoic users authenticate via OAuth 2.0 + PKCE, and downstream calls use an On Behalf Of-style exchange so every tool invocation carries the user’s identity and scopes. This keeps authorization centralized in the APIs while staying consistent with the MCP spec’s direction of travel.

Why On Behalf Of Matters

In Mezzoic, agents never act as their own identity. Instead, they always act On Behalf Of (OBO) the user. In practice, that means:

Every API call is user-scoped. The downstream token carries the user’s identity and entitlements.
Policies remain intact. Existing RBAC/ABAC rules apply just as if the user had called directly.
Audit trails stay clear. Logs show which user did what, even if the agent initiated the call.

This preserves accountability: if an agent triggers a workflow, it’s still the user’s token authorizing it.

Why not other approaches?

Static API keys (even per-user): Technically possible, but long-lived keys are harder to rotate, scope, and audit.
Agent-owned service accounts: Creates “shadow identities” with broad privileges, eroding accountability.
User IDs in payloads: Easy to spoof and bypasses real authorization. Shouldn’t be relied on.

👉 By contrast, OAuth with OBO reuses your IdP, MFA, and conditional access policies. Tokens are short-lived, scoped, and centrally governed.

Challenge: Data Security & Memory

Memory creates new risks because by default it has no natural boundaries. An agent could potentially “remember” across tenants, users, or projects, but should it?

This is not a technical detail to leave implicit. Data isolation must be a deliberate product decision. Teams need to code explicit constraints so memory aligns with governance requirements:

User scope: What information is strictly personal to a single user?
Team/Org scope: What knowledge can be shared safely across a team or department?
Tenant scope: What data must never cross boundaries between customers?

Without these rules, “helpful” memory can quickly become a security liability, an agent recalling sensitive details from the wrong context.

What this means in practice

Retention policies: Decide how long facts or episodes should persist, and when to expire or archive them.
Indexing strategy: Memory should be tagged with clear ownership (user ID, team ID, tenant ID) so lookups respect boundaries.
Privacy by default: If in doubt, agents should forget or re-request rather than risk leaking data across contexts.
Auditability: Memory writes and reads should be logged just like API calls. You want to know who stored what, and who retrieved it later.

👉 Memory can make agents feel smart, but without scoped design it can also make them dangerous. The safe approach is to treat memory like a database with permissions.

Challenge: Autonomous Agents

Autonomous agents are powerful but risky when their actions have financial, legal, or safety implications. Treat agents like privileged humans: enforce least privilege, require explicit approvals, and maintain a full audit trail. Put a policy-decision point (rules engine) in front of every high-impact tool, and drive execution through a workflow engine that can require approvals, cap the blast radius, and record decisions. If the rules engine flags risk above a threshold, escalate to a human, to a second agent for consensus, or invoke a two-person rule, depending on risk appetite and use case.

Conclusion: Questions Teams Should Ask Before Deploying Agents

Accountability: If an agent takes an action, whose identity and permissions does it use? Who is ultimately responsible for the outcome?
Authorization: Are agents operating strictly within existing policies (RBAC/ABAC), or do they create shadow permissions?
Human-in-the-Loop: What types of actions (e.g., deletes, fund transfers, sensitive updates) should always require human confirmation?
Auditability: Can we trace every action back to a user, a token, and a timestamp? Are logs complete enough to satisfy compliance audits?
Memory Boundaries: How do we prevent agents from “remembering” across tenants, teams, or projects where data should stay isolated?
Token Management: Are tokens short-lived and scoped narrowly enough to minimize risk if leaked?
Failure Modes: If the agent is uncertain, does it escalate gracefully, or does it guess and risk unintended consequences?
Data Governance: What retention policies apply to agent memory and logs? Do they align with corporate or regulatory requirements (GDPR, HIPAA, etc.)?
Third-Party Dependencies: If agents rely on external APIs/tools, how are those governed, and do they inherit your security model?
End-User Awareness: Do users understand that agents act on their behalf — and do they know when confirmation will be required?

Agent Architecture: Security & Trust