<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Your Enterprise Architect]]></title><description><![CDATA[Your Enterprise Architect]]></description><link>https://yourenterprisearchitect.com</link><generator>RSS for Node</generator><lastBuildDate>Tue, 07 Apr 2026 19:54:23 GMT</lastBuildDate><atom:link href="https://yourenterprisearchitect.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Beyond the Context Window: Implementing CoALA for State-Aware Enterprise Agents]]></title><description><![CDATA[1. Summary
The Problem: Statelessness in Enterprise Workflows Large Language Models (LLMs) are powerful reasoning engines, but their utility in business environments is significantly constrained by their inherent statelessness. In a standard deployme...]]></description><link>https://yourenterprisearchitect.com/beyond-the-context-window-implementing-coala-for-state-aware-enterprise-agents</link><guid isPermaLink="true">https://yourenterprisearchitect.com/beyond-the-context-window-implementing-coala-for-state-aware-enterprise-agents</guid><category><![CDATA[Episodic Memories]]></category><category><![CDATA[#Cognitive-AI]]></category><category><![CDATA[agentic AI]]></category><dc:creator><![CDATA[Ruben Rotteveel]]></dc:creator><pubDate>Wed, 10 Dec 2025 16:22:54 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1765383458600/b021daa2-40b6-40c5-8c4e-486193633aef.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3 id="heading-1-summary">1. Summary</h3>
<p><strong>The Problem: Statelessness in Enterprise Workflows</strong> Large Language Models (LLMs) are powerful reasoning engines, but their utility in business environments is significantly constrained by their inherent statelessness. In a standard deployment, an agent resets after every session, retaining no context of user preferences, specific project history, or previously corrected errors. This lack of persistent state forces users to redundantly provide context and correct the same mistakes across multiple interactions, capping the efficiency gains that agents can provide.</p>
<p><strong>The Goal</strong> My objective was to evolve the agent from a transient session-based tool into a persistent <strong>Institutional Asset</strong>—a system that retains operational context, learns from user feedback, and improves its baseline performance over time.</p>
<p><strong>The Solution</strong> To achieve this, we implemented an architecture based on the <a target="_blank" href="https://arxiv.org/pdf/2309.02427"><strong>CoALA</strong></a> <strong>(Cognitive Architectures for Language Agents)</strong> framework. By engineering a background memory processor (the "Hippocampus") and a dynamic context injection layer, we created a system that persists experience and democratizes institutional knowledge (a learning agent). This article details the technical implementation of that architecture.</p>
<hr />
<h3 id="heading-2-the-constraint-managing-the-context-window">2. The Constraint: Managing the Context Window</h3>
<p>Communication with an agent is governed by the <strong>Context Window</strong>, which effectively functions as the agent's working memory. This window is finite, measurable in tokens, and determines the scope of data available for immediate reasoning.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765382172167/8c01c309-2b6f-4d08-bf80-c20305e64e52.png" alt="working memory, the context window" class="image--center mx-auto" /></p>
<p>fig 1: The agent context window.</p>
<p>Standard implementations often utilize a "sliding window" approach to manage this limit, where the oldest messages are discarded as new ones arrive. In a complex workflow, this leads to <strong>Contextual Drift</strong>—the loss of critical instructions or project constraints established early in the session.</p>
<p><strong>The Fix: Context Engineering via Summary Injection</strong> To mitigate data loss, we rejected the sliding window in favor of a structured <strong>Context Stack</strong> that prioritizes information relevance over recency:</p>
<ul>
<li><p><strong>The System Layer:</strong> Contains immutable behavioral instructions and guardrails.</p>
</li>
<li><p><strong>The Summary Layer:</strong> Rather than discarding history, the system compresses prior turns into a high-density "Summary Message." This preserves the global context of the session without consuming the token budget of raw logs.</p>
</li>
<li><p><strong>The Active Thread:</strong> The most recent interactions are retained in high fidelity to facilitate immediate reasoning.</p>
</li>
</ul>
<p>This shift allowed us to move from a linear log of text to a cyclic learning system.</p>
<h3 id="heading-3-the-process-asynchronous-episode-extraction">3. The Process: Asynchronous Episode Extraction</h3>
<p>Preserving the <em>current</em> session is only the first step. The critical challenge is capturing lessons from <em>past</em> sessions. To achieve this, we developed an asynchronous background process, internally referred to as the <strong>Hippocampus</strong>, that executes when a conversation reaches a natural pause.</p>
<p>This process transforms unstructured chat logs into structured <strong>Episodes</strong>. However, a simple log dump is too noisy. To extract meaningful signal, we apply a strict segmentation logic based on our "Swarm" architecture.</p>
<p><strong>Step 1: Segmentation (Defining Boundaries)</strong> Our system utilizes a swarm of specialist agents, each equipped with specific tools. We use these natural architectural divisions to slice the conversation stream:</p>
<ul>
<li><p><strong>Conversation Boundaries:</strong> Defined by <strong>Agent Changes</strong>. When the routing layer switches from the "Coder Agent" to the "Billing Agent," the current Conversation object is closed and a new one begins.</p>
</li>
<li><p><strong>Topic Boundaries:</strong> Defined by <strong>Entity Changes</strong>. Within a conversation, if the user pivots from "Project Alpha" to "Project Beta," we detect this entity shift (via spaCy) and create a new Topic node.</p>
</li>
</ul>
<p><strong>Step 2: Classification &amp; Validation</strong> Once boundaries are established, we extract granular metadata for every message, including <strong>Domain</strong>, <strong>Operation</strong>, and <strong>Speech Act</strong> (Question, Command, Comment).</p>
<p>Finally, we employ a reasoning model to validate the extracted segments against strict rules. The primary acceptance criterion for an Episode is that <strong>a set of actions must culminate in a tangible result.</strong> If a segment is just "chatter" without an outcome, it is discarded. If the validation fails (e.g., the result is ambiguous), the model receives feedback and retries the extraction.</p>
<p>This rigorous filtering ensures that our Vector Database is populated only with high-value operational patterns, rather than noise.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765382797062/3593740f-c7a8-4ec3-ac18-4a2c57efcd16.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-4-the-feedback-loop-episodic-injection">4. The Feedback Loop: Episodic Injection</h3>
<p>The existence of stored memory is insufficient; the agent must be architected to retrieve and apply it contextually.</p>
<p>We evaluated several retrieval strategies. <strong>Chain of Thought (CoT)</strong> prompting often proved too rigid; if a retrieved memory did not perfectly align with the current scenario, the agent would hallucinate constraints or fail to adapt.</p>
<p><strong>The Solution: The "Memory Message"</strong> We implemented a pattern of <strong>Episodic Injection</strong>.</p>
<ol>
<li><p><strong>Retrieval:</strong> When a user initiates a prompt, the system queries the vector database for semantically similar past Episodes.</p>
</li>
<li><p><strong>Injection:</strong> Relevant Episodes are formatted into a dedicated <strong>Memory Message</strong> and injected into the Context Stack <em>prior</em> to the user's new prompt.</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765383236283/cd4c2112-acca-474d-b493-2ad296a5e2d2.png" alt class="image--center mx-auto" /></p>
<p><strong>Operational Example:</strong></p>
<ul>
<li><p><strong>User Request:</strong> "Generate a SQL query for the User Analytics table."</p>
</li>
<li><p><strong>Memory Injection:</strong> "Observation: In a previous session regarding 'User Analytics', the user rejected a query for lacking index hints. Result: Negative."</p>
</li>
<li><p><strong>Agent Action:</strong> The agent preemptively includes index hints in the generated SQL, avoiding the previous error.</p>
</li>
</ul>
<h3 id="heading-5-the-system-architecture">5. The System Architecture</h3>
<p>By integrating these components, the architecture transitions from a linear input-output model to a cyclic cognitive system.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765382273655/8331a38a-bad6-410c-84b7-93bc38a7b69a.png" alt class="image--center mx-auto" /></p>
<ul>
<li><p><strong>The Context Window</strong> manages immediate execution.</p>
</li>
<li><p><strong>The Hippocampus</strong> manages consolidation (writing experience).</p>
</li>
<li><p><strong>The Injection Layer</strong> manages retrieval (reading experience).</p>
</li>
</ul>
<p>This creates a closed loop: <strong>Sense → Reason → Act → Learn.</strong></p>
<p>The business value of this architecture extends beyond efficiency. It effectively captures <strong>Tacit Knowledge</strong>, the unwritten, experiential knowledge of senior staff and structures it. This allows junior team members to benefit from the accumulated experience of the organization automatically, as the agent retrieves "senior" strategies to guide "junior" requests.</p>
<h3 id="heading-6-future-development">6. Future Development</h3>
<p>This implementation establishes the foundation for a state-aware agent. Our roadmap focuses on three advanced capabilities:</p>
<ol>
<li><p><strong>Topic-Based Retrieval:</strong> Moving beyond semantic similarity in individual prompts to analyzing the broader <em>topic</em> or <em>domain</em> of a conversation, enabling the retrieval of strategic context rather than just tactical corrections.</p>
</li>
<li><p><strong>Innovation via Negation:</strong> Rather than prescribing a specific path, we aim to retrieve "Failure Modes" to define a boundary of negative constraints. This allows the agent to innovate within the solution space while strictly avoiding known pitfalls.</p>
</li>
<li><p><strong>Synthetic Best Practices:</strong> Pre-loading the vector database with "Synthetic Memories" derived from corporate documentation and policy. This would provide a newly deployed agent with a baseline of institutional competence immediately upon activation.</p>
</li>
</ol>
]]></content:encoded></item><item><title><![CDATA[Real World Agentic Solutions: Turning Microservices into an AI Workforce.]]></title><description><![CDATA[I don’t remember the last time I had so much fun building something. The process felt less like writing software and more like assembling a cognitive entity — giving it personality, skills, memory, and the ability to learn and evolve.
At times it fel...]]></description><link>https://yourenterprisearchitect.com/real-world-agentic-solutions-turning-microservices-into-an-ai-workforce</link><guid isPermaLink="true">https://yourenterprisearchitect.com/real-world-agentic-solutions-turning-microservices-into-an-ai-workforce</guid><category><![CDATA[agentic AI]]></category><category><![CDATA[Multi-Agent Systems (MAS)]]></category><category><![CDATA[Swarm]]></category><dc:creator><![CDATA[Ruben Rotteveel]]></dc:creator><pubDate>Tue, 16 Sep 2025 16:08:06 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1757978485083/09d6e340-6786-403f-9822-925816bcf5d0.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I don’t remember the last time I had so much fun building something. The process felt less like writing software and more like <strong>assembling a cognitive entity</strong> — giving it personality, skills, memory, and the ability to learn and evolve.</p>
<p>At times it felt like <em>Frankenstein</em> — but instead of a monster, I was creating a team: the best Business Analysts, Product Owners, Project Managers, and Resource Managers, rolled into agents that never sleep, always learn, and are eager to help. It was magical, and honestly, humbling.</p>
<p>But then came the real challenge: <strong>how do you move beyond proofs of concept and demos, and integrate a multi-agent solution into a real production platform?</strong></p>
<p>This article is the <strong>overview</strong> of how I approached that challenge in building <strong>Mezzoic’s Agent helper</strong>. It introduces the architecture I used, the problems I ran into, and the principles I found essential.</p>
<p>In this series, I dive deeper into each of those problem areas:</p>
<ul>
<li><p><a target="_blank" href="https://yourenterprisearchitect.com/context-is-everything-managing-tokens-memory-and-prompts-for-multi-agent-systems"><em>Context is Everything</em></a> → how prompts, memory, and tokens define agent reliability.</p>
</li>
<li><p><a target="_blank" href="https://yourenterprisearchitect.com/from-apis-to-agent-tools-designing-for-multi-agent-systems"><em>From APIs to Agent Tools</em></a> → how MCP servers transform APIs into safe, intent-driven tools.</p>
</li>
<li><p><a target="_blank" href="https://yourenterprisearchitect.com/agent-architecture-security-and-trust"><em>Security &amp; Trust</em></a> → how to align agent security with your existing backend policies.</p>
</li>
</ul>
<p>Together, these pieces are about one thing: turning multi-agent systems from fragile experiments into <strong>a real domain workforce inside your architecture</strong>.</p>
<h2 id="heading-challenge-keep-the-architecture-simple">Challenge: Keep the Architecture Simple</h2>
<p>The magic of multi-agent systems is easy to capture in a demo. The hard part is making them work inside a real production system.</p>
<p>With <strong>Mezzoic</strong>, I started from a simple idea: if we already have a solid domain architecture with <strong>domain services</strong>, why not extend them into <strong>domain agents</strong>? Services represent bounded areas of expertise in the product, accounting, quoting, scheduling. Agents can be thought of as the experts that live inside those domains, capable of reasoning and acting just like the people on a project team.</p>
<p>That shift, from <strong>domain services → domain agents → domain workforce,</strong> became the foundation of the design.</p>
<p>But here’s the key: I didn’t want to rip up a working system just to experiment with agents. I wanted agents to act as <strong>extensions, not modifications</strong>, of the core. This is the idea of applying the <strong>Open/Closed Principle</strong> from SOLID design to your architecture:</p>
<ul>
<li><p>Keep the platform closed to modification (stable, proven, secure).</p>
</li>
<li><p>Keep it open to extension (agents can plug in on top).</p>
</li>
</ul>
<p>This is my golden rule of <strong>don’t break what’s already working</strong>.</p>
<p>The way to achieve that in practice is by <strong>wrapping existing APIs with MCP servers</strong>. MCP provides a protocol, much like REST, but tailored for agents. It turns APIs into <strong>discoverable, intent-driven tools</strong> that agents can actually use.</p>
<p>👉 I dive deeper into this in the article <a target="_blank" href="https://yourenterprisearchitect.com/from-apis-to-agent-tools-designing-for-multi-agent-systems"><em>From APIs to Agent Tools</em></a>, but at the overview level the principle is simple: <strong>Don’t redesign your backend for agents. Extend it with MCP.</strong></p>
<h3 id="heading-architecture-overview">Architecture Overview</h3>
<p>Mezzoic mutli agent solution uses the <strong>agent swarm model</strong> for orchestration. Instead of a single supervisor agent delegating tasks, agents collaborate more fluidly, each equipped with the tools it needs to step in and act. This makes the system more resilient but it also means tool design must be clear to avoid collisions. Each agent is a domain expert, and in the Mezzoic context, that means there’s a dedicated Project Manger, Product Owner, Business Analyst and People Manager agent that you interact with.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1757528836887/63d126ed-4704-4276-bc40-981acd1e9a53.png" alt class="image--center mx-auto" /></p>
<p>The color coding is meant to differentiate between what’s new and what exists. Blues are new agent focused, purple are the MCP server extensions to your backend and Yellow are existing systems you’re leveraging.</p>
<ul>
<li><p>One key design choice: agent layer = <strong>extension, not modification</strong>.</p>
</li>
<li><p>Separation of concerns:</p>
<ul>
<li><p>Agents → reasoning + orchestration</p>
</li>
<li><p>APIs → business logic + enforcement</p>
</li>
<li><p>MCP → bridge between them</p>
</li>
</ul>
</li>
</ul>
<p>👉 you can read some more about it in my in depth article, <a target="_blank" href="https://yourenterprisearchitect.com/from-apis-to-agent-tools-designing-for-multi-agent-systems"><em>From APIs to Agent Tools</em></a></p>
<h2 id="heading-challenge-understanding-and-managing-context">Challenge: Understanding and Managing Context</h2>
<p><strong>Context is everything.</strong><br />The biggest challenge by far, is context, it is the new skill we have to master to make Agents and GenAI work. Context is the power and the achilles heel of agents. Get it right and your users will love the agent, get it wrong and they’ll get frustrated pretty quickly.</p>
<p>Context is what the model receives, instructions, tools, history, responses, it defines the agents personality, skills and reliability.</p>
<p>The challenge I faced were:</p>
<ul>
<li><p>Running up against the token limit, LLMs limit the context window, and when the data you pull is content heavy, context fills up really quickly.</p>
</li>
<li><p>Producing effective prompts and descriptions</p>
</li>
<li><p>Creating easy to use tools that don’t require elaborate and hard to follow instructions.</p>
</li>
<li><p>Managing context so I give the agent <em>enough</em> context to be helpful and consistent.</p>
</li>
</ul>
<p>The solution was to treat context as a <strong>first-class architectural concern</strong>. It wasn’t an afterthought; it became the core design problem.</p>
<p>What worked:</p>
<ul>
<li><p><strong>Summarization &amp; focus</strong> → keep only what’s relevant, collapse old threads into summaries.</p>
</li>
<li><p><strong>Specialized agents</strong> → each agent carries only the prompts and tools it needs, minimizing clutter.</p>
</li>
<li><p><strong>Semantic &amp; episodic memory</strong> → facts (preferences, roles) and experiences (what worked last time) stored outside the prompt, retrieved when needed.</p>
</li>
<li><p><strong>Async/event-driven updates</strong> → keep the agent focused without drowning it in stale data.</p>
</li>
<li><p><strong>Iterating on Prompts and tool descriptions</strong> until the LLM understood and could sucessfully accomplish the use cases we have.</p>
</li>
</ul>
<p>👉 <strong>Read more in</strong> <a target="_blank" href="https://yourenterprisearchitect.com/context-is-everything-managing-tokens-memory-and-prompts-for-multi-agent-systems"><strong>Deep Dive #2: Context Is Everything</strong></a></p>
<h2 id="heading-challenge-security-amp-trust">Challenge: Security &amp; Trust</h2>
<blockquote>
<p><em>The concern with adding agents is that they might bypass existing controls — giving users unintended powers or introducing a shadow security model.</em></p>
</blockquote>
<p><strong>Solution:</strong> leverage the existing API security model.</p>
<p>In Mezzoic, agents are simply <strong>extensions of existing domain services</strong>. That meant I could lean on the security model I already trusted. The APIs already implement Mezzoic’s governance and security requirements. and agent tools just wrap those APIs and can’t circumvent them.</p>
<p>Agents don’t need special powers; they need to <strong>respect the same policies as the rest of the system</strong>.</p>
<p>What worked:</p>
<ul>
<li><p><strong>OAuth OBO</strong> → agents act strictly on behalf of users.</p>
</li>
<li><p><strong>Narrow scopes &amp; short-lived tokens</strong> → tools stay limited and time-boxed.</p>
</li>
<li><p><strong>Egress allow-lists</strong> → agents only reach approved MCP servers.</p>
</li>
<li><p><strong>Auditability</strong> → every call is logged: who did what, when, via which tool.</p>
</li>
</ul>
<p>👉 <strong>Read more in</strong> <a target="_blank" href="https://yourenterprisearchitect.com/agent-architecture-security-and-trust"><strong>Deep Dive #3: Security &amp; Trust</strong></a></p>
<hr />
<h2 id="heading-closing">C<a class="post-section-overview" href="#">losing</a></h2>
<p><a class="post-section-overview" href="#">Extending Mezzoic with</a> agents taught me three big lessons:</p>
<ol>
<li><p><strong>Keep the architecture simple (KISS)</strong> → agents should extend the system, not complicate or replace it.</p>
</li>
<li><p><strong>Context is everything</strong> → the reliability and personality of agents depend more on context management than on the model itself.</p>
</li>
<li><p><strong>Security &amp; trust must be inherited</strong> → agents don’t need a new security model; they need to respect the one you already trust.</p>
</li>
</ol>
<p>Those principles turned Mezzoic’s agents from fragile demos into a <strong>domain workforce</strong>: specialists that collaborate, stay within policy, and scale with the platform.</p>
<p>This article gave the overview. The real detail lives in the deep dives:</p>
<ul>
<li><p><strong>Context is Everything</strong> → tokens, memory, prompts.</p>
</li>
<li><p><a target="_blank" href="https://yourenterprisearchitect.com/from-apis-to-agent-tools-designing-for-multi-agent-systems"><strong>From APIs to Agent Tools</strong></a> → MCP servers, marshal-by-reference, tool patterns.</p>
</li>
<li><p><a target="_blank" href="https://yourenterprisearchitect.com/agent-architecture-security-and-trust"><strong>Security &amp; Trust</strong></a> → aligning agents with existing governance.</p>
</li>
</ul>
<p>Together, these form a blueprint for moving beyond proofs of concept and into <strong>production-ready multi-agent systems</strong>.</p>
]]></content:encoded></item><item><title><![CDATA[Agent Architecture: Security & Trust]]></title><description><![CDATA[Everyone I’ve spoken with about agents asks the same thing: “What about security?”
The concern isn’t just technical, it’s governance. If an agent makes a mistake, who’s accountable? If it accesses data, which policies apply?
In this article, I share ...]]></description><link>https://yourenterprisearchitect.com/agent-architecture-security-and-trust</link><guid isPermaLink="true">https://yourenterprisearchitect.com/agent-architecture-security-and-trust</guid><category><![CDATA[agentic security]]></category><category><![CDATA[agent guardrails]]></category><category><![CDATA[agent trust]]></category><category><![CDATA[mcp OAuth2.1]]></category><category><![CDATA[agentic AI]]></category><dc:creator><![CDATA[Ruben Rotteveel]]></dc:creator><pubDate>Tue, 16 Sep 2025 13:14:19 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1757971610066/dcf7a4db-51eb-43a3-9be9-b0fa87400cf3.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Everyone I’ve spoken with about agents asks the same thing: <em>“What about security?”</em></p>
<p>The concern isn’t just technical, it’s governance. If an agent makes a mistake, who’s accountable? If it accesses data, which policies apply?</p>
<p>In this article, I share the principles I use to answer those questions and align agent security with enterprise trust models. The context is Mezzoic, the product and project management platform I built, but the lessons apply to any organization exploring multi-agent systems.</p>
<h2 id="heading-challenge-trust">Challenge: Trust</h2>
<p>The biggest question in Security &amp; Trust is <strong>accountability</strong>: who is responsible for the actions taken by (or on behalf of) the user?</p>
<p>My opinion is clear: <strong>accountability lies with the user and the application developers, not the agent.</strong> Agents are too new, too naive, to be trusted with unsupervised accountability. They lack mature governance models, and “guardrails” today are more marketing promise than operational reality. There are simply too many loopholes to plug with brittle, complex rules.</p>
<p>That’s why I avoid over-engineered guardrails altogether. Instead, I focus on two principles:</p>
<ol>
<li><p><strong>Propagate the user’s context</strong> all the way from the agent to the API layer (OAuth OBO).</p>
</li>
<li><p><strong>Require Human-in-the-Loop (HITL) confirmation</strong> for any risky action.</p>
</li>
<li><p>Use <strong>Rules engines and Workflows</strong> to determine if actions are allowed (could be process or risk based).</p>
</li>
</ol>
<h3 id="heading-guardrails-and-human-in-the-loop">Guardrails and Human-in-the-Loop</h3>
<p>Guardrails ≠ governance. They may provide some protection, but cannot ensure safety, these guardrails aren’t hard constraints, they’re soft because they’re interpreted by the LLM and the LLM is unconstrained and unpredictable. Human review, on the other hand, guarantees accountability where it matters.</p>
<ul>
<li><p><strong>Transparency over hidden controls</strong> → The agent should <em>show its reasoning</em> (“I plan to delete these 5 records because…”) and ask for confirmation.</p>
</li>
<li><p><strong>Effort vs. impact</strong> → Guardrails are costly and brittle; human review is clearer, cheaper, and more adaptable.</p>
</li>
<li><p><strong>Fail gracefully</strong> → If the agent isn’t sure, it should escalate — not guess.</p>
</li>
</ul>
<p>👉 <em>Use agents to accelerate safe work, not bypass human judgment in high-risk areas.</em></p>
<h3 id="heading-risk-hierarchy-where-human-in-the-loop-is-required">Risk Hierarchy (where Human-in-the-Loop is required)</h3>
<ul>
<li><p><strong>Information loss</strong> → Any delete operation (soft or hard) must require explicit confirmation.</p>
</li>
<li><p><strong>Information confusion</strong> → Creating or updating critical entities should always prompt user review.</p>
</li>
<li><p><strong>Financial transactions</strong> → Any movement of funds requires human approval. (<em>Specialized automated trading systems are the exception, as they rely on dedicated risk controls.</em>)</p>
</li>
</ul>
<h2 id="heading-challenge-authentication-and-authorization">Challenge: Authentication and Authorization</h2>
<p>Where does authorization belong? <strong>Not in the agent.</strong> It belongs in the MCP servers or your APIs, which already enforce your organization’s policies. That way, agents can’t “make up” permissions — they only operate within the boundaries you already trust.</p>
<p>The MCP <a target="_blank" href="https://modelcontextprotocol.io/specification/2025-03-26/basic/authorization">specification</a> recommends using <strong>OAuth 2.1</strong> for authentication and authorization, particularly when exposing MCP servers. In practice, support for OAuth 2.1 is still maturing across identity providers, so many teams continue to rely on <strong>OAuth 2.0 flows (e.g., Authorization Code with PKCE, On-Behalf-Of)</strong> to achieve the same goals.</p>
<p>In practice, agents should never have their own ‘superpowers.’ They should always inherit the same identity, rules, and auditability as the human user. In Mezzoic <strong>users authenticate via OAuth 2.0 + PKCE</strong>, and downstream calls use an <strong>On Behalf Of-style exchange</strong> so every tool invocation carries the user’s identity and scopes. This keeps authorization centralized in the APIs while staying consistent with the MCP spec’s direction of travel.</p>
<h3 id="heading-why-on-behalf-of-matters">Why On Behalf Of Matters</h3>
<p>In Mezzoic, agents never act as their own identity. Instead, they always act <strong>On Behalf Of (OBO)</strong> the user. In practice, that means:</p>
<ul>
<li><p><strong>Every API call is user-scoped.</strong> The downstream token carries the user’s identity and entitlements.</p>
</li>
<li><p><strong>Policies remain intact.</strong> Existing RBAC/ABAC rules apply just as if the user had called directly.</p>
</li>
<li><p><strong>Audit trails stay clear.</strong> Logs show which user did what, even if the agent initiated the call.</p>
</li>
</ul>
<p>This preserves accountability: if an agent triggers a workflow, it’s still the user’s token authorizing it.</p>
<h3 id="heading-why-not-other-approaches">Why not other approaches?</h3>
<ul>
<li><p><strong>Static API keys (even per-user):</strong> Technically possible, but long-lived keys are harder to rotate, scope, and audit.</p>
</li>
<li><p><strong>Agent-owned service accounts:</strong> Creates “shadow identities” with broad privileges, eroding accountability.</p>
</li>
<li><p><strong>User IDs in payloads:</strong> Easy to spoof and bypasses real authorization. Shouldn’t be relied on.</p>
</li>
</ul>
<p>👉 By contrast, <strong>OAuth with OBO</strong> reuses your IdP, MFA, and conditional access policies. Tokens are short-lived, scoped, and centrally governed.</p>
<hr />
<h2 id="heading-challenge-data-security-amp-memory">Challenge: Data Security &amp; Memory</h2>
<p>Memory creates new risks because by default it has <strong>no natural boundaries</strong>. An agent could potentially “remember” across tenants, users, or projects, but should it?</p>
<p>This is not a technical detail to leave implicit. <strong>Data isolation must be a deliberate product decision.</strong> Teams need to code explicit constraints so memory aligns with governance requirements:</p>
<ul>
<li><p><strong>User scope:</strong> What information is strictly personal to a single user?</p>
</li>
<li><p><strong>Team/Org scope:</strong> What knowledge can be shared safely across a team or department?</p>
</li>
<li><p><strong>Tenant scope:</strong> What data must <em>never</em> cross boundaries between customers?</p>
</li>
</ul>
<p>Without these rules, “helpful” memory can quickly become a <strong>security liability</strong>, an agent recalling sensitive details from the wrong context.</p>
<h3 id="heading-what-this-means-in-practice">What this means in practice</h3>
<ul>
<li><p><strong>Retention policies:</strong> Decide how long facts or episodes should persist, and when to expire or archive them.</p>
</li>
<li><p><strong>Indexing strategy:</strong> Memory should be tagged with clear ownership (user ID, team ID, tenant ID) so lookups respect boundaries.</p>
</li>
<li><p><strong>Privacy by default:</strong> If in doubt, agents should forget or re-request rather than risk leaking data across contexts.</p>
</li>
<li><p><strong>Auditability:</strong> Memory writes and reads should be logged just like API calls. You want to know <em>who stored what, and who retrieved it later</em>.</p>
</li>
</ul>
<p>👉 Memory can make agents feel smart, but without scoped design it can also make them dangerous. The safe approach is to <strong>treat memory like a database with permissions</strong>.</p>
<h2 id="heading-challenge-autonomous-agents">Challenge: Autonomous Agents</h2>
<p>Autonomous agents are powerful but risky when their actions have financial, legal, or safety implications. Treat agents like privileged humans: enforce least privilege, require explicit approvals, and maintain a full audit trail. Put a policy-decision point (rules engine) in front of every high-impact tool, and drive execution through a workflow engine that can require approvals, cap the blast radius, and record decisions. If the rules engine flags risk above a threshold, escalate to a human, to a second agent for consensus, or invoke a two-person rule, depending on risk appetite and use case.</p>
<h2 id="heading-conclusion-questions-teams-should-ask-before-deploying-agents">Conclusion: <strong>Questions Teams Should Ask Before Deploying Agents</strong></h2>
<ul>
<li><p><strong>Accountability:</strong> If an agent takes an action, whose identity and permissions does it use? Who is ultimately responsible for the outcome?</p>
</li>
<li><p><strong>Authorization:</strong> Are agents operating strictly within existing policies (RBAC/ABAC), or do they create shadow permissions?</p>
</li>
<li><p><strong>Human-in-the-Loop:</strong> What types of actions (e.g., deletes, fund transfers, sensitive updates) should always require human confirmation?</p>
</li>
<li><p><strong>Auditability:</strong> Can we trace every action back to a user, a token, and a timestamp? Are logs complete enough to satisfy compliance audits?</p>
</li>
<li><p><strong>Memory Boundaries:</strong> How do we prevent agents from “remembering” across tenants, teams, or projects where data should stay isolated?</p>
</li>
<li><p><strong>Token Management:</strong> Are tokens short-lived and scoped narrowly enough to minimize risk if leaked?</p>
</li>
<li><p><strong>Failure Modes:</strong> If the agent is uncertain, does it escalate gracefully, or does it guess and risk unintended consequences?</p>
</li>
<li><p><strong>Data Governance:</strong> What retention policies apply to agent memory and logs? Do they align with corporate or regulatory requirements (GDPR, HIPAA, etc.)?</p>
</li>
<li><p><strong>Third-Party Dependencies:</strong> If agents rely on external APIs/tools, how are those governed, and do they inherit your security model?</p>
</li>
<li><p><strong>End-User Awareness:</strong> Do users understand that agents act on their behalf — and do they know when confirmation will be required?</p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Context Is Everything: Managing Tokens, Memory, and Prompts for Multi-Agent Systems]]></title><description><![CDATA[If I had to simplify the work of integrating agents into one phrase, it would be this:
Context is everything.
The models are powerful, the APIs are stable, but context determines whether your agent feels like a helpful collaborator or a confused chat...]]></description><link>https://yourenterprisearchitect.com/context-is-everything-managing-tokens-memory-and-prompts-for-multi-agent-systems</link><guid isPermaLink="true">https://yourenterprisearchitect.com/context-is-everything-managing-tokens-memory-and-prompts-for-multi-agent-systems</guid><category><![CDATA[context]]></category><category><![CDATA[context engineering]]></category><category><![CDATA[context-window]]></category><dc:creator><![CDATA[Ruben Rotteveel]]></dc:creator><pubDate>Mon, 15 Sep 2025 16:49:43 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1757968233223/34691f0c-7f8a-4e85-9ca9-8e939a03b3bf.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If I had to simplify the work of integrating agents into one phrase, it would be this:</p>
<p><strong>Context is everything.</strong></p>
<p>The models are powerful, the APIs are stable, but context determines whether your agent feels like a helpful collaborator or a confused chatbot.</p>
<hr />
<h2 id="heading-what-we-mean-by-context">What We Mean by “Context”</h2>
<p>For an LLM, <em>context</em> is everything it receives to generate a response:</p>
<ul>
<li><p>System messages (instructions)</p>
</li>
<li><p>Tool descriptions</p>
</li>
<li><p>Conversation history</p>
</li>
<li><p>Tool responses</p>
</li>
<li><p>Even its own “internal thoughts”</p>
</li>
</ul>
<p>How you manage this context defines:</p>
<ul>
<li><p>The <strong>personality</strong> of the agent</p>
</li>
<li><p>Its <strong>consistency and accuracy</strong></p>
</li>
<li><p>And ultimately, how well it <strong>delights your users</strong></p>
</li>
</ul>
<hr />
<h2 id="heading-why-context-matters">Why Context Matters</h2>
<p>The ability of an agent to deliver results and to feel trustworthy depends more on <strong>context</strong> than on the model itself.</p>
<p>Context is the foundation of the agent’s <strong>identity, personality, and responsibilities</strong>, and most importantly its ability to <strong>make the user happy:</strong></p>
<ul>
<li><p><strong>Personality &amp; Voice</strong> → Context defines who the agent <em>is</em>. Without consistent prompts and framing, tone drifts and the agent feels incoherent.</p>
</li>
<li><p><strong>Continuity of Thought</strong> → LLMs don’t think persistently; they simulate reasoning turn by turn. Context is the bridge that makes an agent appear continuous instead of starting over each time.</p>
</li>
<li><p><strong>Shared Understanding</strong> → Context carries the conversation state: which tools were used, what the goal is, what’s already been decided. Without it, the user has to re-explain, which kills trust.</p>
</li>
<li><p><strong>Boundaries of Expertise</strong> → By constraining instructions and tools, context defines what an agent <em>should</em> and <em>shouldn’t</em> attempt. That prevents overreach and hallucinations.</p>
</li>
<li><p><strong>User Experience Consistency</strong> → Users don’t judge the LLM; they judge whether the agent “gets them.” Context is what remembers preferences, adapts, and keeps interactions smooth.</p>
</li>
</ul>
<p>👉 Put simply: <strong>context isn’t just input. It’s the agent’s memory, personality, boundaries, and shared understanding with the user. Without it, the agent isn’t really an agent — it’s just a one-off prompt.</strong></p>
<hr />
<h2 id="heading-how-to-build-prompts-and-tool-descriptions">How to Build Prompts and Tool Descriptions</h2>
<p>There’s a lot of content in the wild on <strong>prompt engineering</strong> (here’s a good <a target="_blank" href="https://cookbook.openai.com/examples/gpt-5/gpt-5_prompting_guide">prompt guide</a>), but not nearly enough has been said about <strong>tool instructions and descriptions</strong>. In practice, I’ve found this to be one of the hardest areas to get right.</p>
<p>In the past, <strong>users read the manuals</strong>. They figured out workflows, experimented, and learned the system themselves.</p>
<p>Now, that job shifts to the <strong>LLM</strong>. The model must “read the manual” through your tool descriptions, understand what each tool does, and use them correctly. The level of detail required isn’t always obvious which makes <strong>testing and iteration critical</strong>.</p>
<p><strong>Example:</strong> imagine a <strong>Quote Agent</strong> that helps create and edit customer quotes. The agent doesn’t learn workflows from a user guide — it depends entirely on tool descriptions like <code>quote.add_line_item</code> or <code>quote.set_delivery_date</code>. If those descriptions are vague, the agent stumbles.</p>
<h3 id="heading-what-this-means">What this means</h3>
<ul>
<li><p><strong>Documentation becomes prompts and tool descriptions.</strong></p>
</li>
<li><p>These descriptions should live in their <strong>own files or repositories</strong>, but remain <strong>tightly coupled</strong> to the tool they describe.</p>
</li>
<li><p>Treat them as <strong>living artifacts</strong> — versioned, reviewed, and tested like code.</p>
</li>
</ul>
<hr />
<h3 id="heading-common-challenges-with-tool-descriptions">Common Challenges with Tool Descriptions</h3>
<p><strong>1. Complex parameters</strong></p>
<ul>
<li><p>Problem: The LLM fails to generate valid inputs, forcing it to ask the user too many clarifying questions<a class="post-section-overview" href="#">.</a></p>
</li>
<li><p>Solution: Keep tools and parameters simple. Create tools with <strong>clear, narrow responsibilities</strong>. (I dive deeper into this in the <a target="_blank" href="https://some.link.com">Agent Architecture article</a>.)</p>
</li>
</ul>
<p><strong>2. Overlapping responsibilities</strong></p>
<ul>
<li><p>Problem: Tools with similar or unclear scopes confuse the agent. It may pick the wrong one and fail to complete its task.</p>
</li>
<li><p>Solution: Define <strong>exclusive domains</strong> for tools. Be explicit about when and why each tool should be used.</p>
<p>  An <em>exclusive domain</em> means each tool has a <strong>clear, non-overlapping area of responsibility</strong>, so the agent doesn’t have to guess between multiple tools that “sort of” do the same thing.</p>
<p>  Think of it like <strong>team roles</strong>: if two employees both think they’re responsible for scheduling meetings, you’ll get duplication or confusion. Same with tools.</p>
<h3 id="heading-how-to-define-it"><strong>How to define it</strong></h3>
<p>  When designing tools, write down:</p>
</li>
<li><ul>
<li><p><strong>Purpose</strong> → What <em>specific outcome</em> this tool achieves.</p>
<p>    * <strong>Scope</strong> → What <em>it doesn’t do</em>. If another tool covers that, this one must stay out.</p>
<p>    * <strong>Trigger conditions</strong> → When the agent should call it (e.g., “use this tool when the user wants to create a new quote, not when editing an existing one”).</p>
</li>
</ul>
</li>
</ul>
<p><strong>3. Capability &amp; description drift</strong></p>
<ul>
<li><p>Problem: Descriptions and tool code often live in different places, owned by different people. When one changes without the other, the agent gets confused — outcomes don’t match the instructions.</p>
</li>
<li><p>Solution: Co-locate responsibility. <strong>Update code and descriptions together</strong>. Build processes that make drift visible (e.g., testing generated plans against descriptions).</p>
</li>
</ul>
<hr />
<p>👉 <strong>Pro tip:</strong> Treat prompts and tool descriptions as <strong>first-class code artifacts</strong> — with versioning, testing, and clear ownership. Don’t let them live as throwaway comments; they’re the interface between your system and your agent.</p>
<hr />
<h2 id="heading-the-challenge-of-token-limits">The Challenge of Token Limits</h2>
<p>Once prompts and tool descriptions are solid, the next constraint you’ll hit is the <strong>model’s token limit</strong>.</p>
<p>LLMs have hard context windows. You can’t keep everything in context. Long Prompts, long descriptions, a long message history, all the data coming back from the tools, files it has read, web search content it has found, and all internal thoughts are part of the context. It fills up pretty quickly.</p>
<p>Managing context is a significant challenge and has a big impact on the agents effectiveness.</p>
<h3 id="heading-strategies">Strategies</h3>
<ul>
<li><p><strong>FIFO truncation:</strong> Keep only the most recent messages. Simple, but loses long-term awareness.</p>
</li>
<li><p><strong>Topic-aware summarization:</strong> Summarize past exchanges by topic. When the user switches back, the summary brings the relevant history forward.</p>
</li>
<li><p><strong>Hybrid:</strong> Recent messages stay raw; older ones collapse into summaries.</p>
</li>
</ul>
<p>👉 The goal is <strong>focus</strong>: give the model exactly what it needs for the current task, nothing more.</p>
<hr />
<h2 id="heading-semantic-and-episodic-memory">Semantic and Episodic Memory</h2>
<p>Context management doesn’t end at the prompt. Agents need <strong>memory</strong> to learn from experience, feel reliable and personal.</p>
<p>⚠️ <strong>Note:</strong> Memory is not naturally limited to a single thread or session. It’s technically <strong>unbounded</strong> — you could design it to span users, teams, projects, or even tenants. That flexibility is powerful, but it also means you must <strong>enforce explicit boundaries and retention policies</strong> to stay within privacy, regulatory, and data sovereignty requirements.</p>
<h3 id="heading-semantic-memory-facts-amp-awareness">Semantic Memory (facts &amp; awareness)</h3>
<ul>
<li><p>Stores facts about users, teams, or projects.</p>
</li>
<li><p>Lets the agent adapt to preferences, roles, and organizational context.</p>
</li>
<li><p>Can be scoped at <strong>user, team, or org</strong> levels.</p>
</li>
<li><p>Stored in vector databases (e.g., Elastic, Weaviate, Pinecone).</p>
</li>
</ul>
<p>Example:</p>
<blockquote>
<p><em>“This user prefers concise answers.”</em><br /><em>“Team A is working on Project X; Bob owns feature Y.”</em></p>
</blockquote>
<h3 id="heading-episodic-memory-experience-amp-learning">Episodic Memory (experience &amp; learning)</h3>
<ul>
<li><p>Captures experiences as <strong>episodes</strong>: subject + actions + results.</p>
</li>
<li><p>Lets the agent learn from past outcomes and apply them in similar situations.</p>
</li>
<li><p>Can be leveraged to <strong>convert a new user request into a few-shot prompt or chain-of-thought (CoT)</strong> reusing the successful steps from a prior interaction to accomplish a similar goal.</p>
</li>
<li><p>Built from conversation parsing, sentiment analysis, and conclusion markers (“that worked”, “let’s move on”).</p>
</li>
<li><p>Stored in vector stores for retrieval in future sessions.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1757611717252/87a00f79-72a6-4b82-b29a-6d55545c5f11.png" alt /></p>
<p><em>Example:</em></p>
<blockquote>
<p>“Last time the user asked for a resource plan, the timeline format was off — next time, use Gantt style.”</p>
</blockquote>
<p>👉 This way, episodic memory isn’t just “remembering experiences” — it becomes <strong>structured training data for the agent’s next decision</strong>.</p>
<hr />
<h2 id="heading-retrieval-augmented-generation-rag">Retrieval-Augmented Generation (RAG)</h2>
<p>RAG extends memory beyond what fits in the prompt:</p>
<ul>
<li><p>Index your knowledge base into a vector store.</p>
</li>
<li><p>Let the agent retrieve relevant chunks on demand.</p>
</li>
<li><p>Feed those chunks into the context window.</p>
</li>
</ul>
<p>This lets agents answer with <strong>organizational knowledge</strong> instead of hallucinations.</p>
<p>👉 At Mezzoic, ElasticSearch has proven to be the most stable and scalable option, but the space is evolving quickly.</p>
<hr />
<h2 id="heading-mezzoics-approach-to-context">Mezzoic’s Approach to Context</h2>
<p>In Mezzoic, context management is <strong>built into the workflow orchestrator and MCP client layer</strong>, not bolted on afterward.</p>
<ul>
<li><p><strong>Asynchronous, event-driven updates</strong> prevent the model from being overloaded with stale or irrelevant info.</p>
</li>
<li><p><strong>LangChain, LangGraph, and LangMem</strong> provide the tooling, but we customize them heavily. They’re powerful, but they come with performance costs.</p>
</li>
<li><p><strong>Prompt and tool descriptions</strong> are versioned and tested alongside code, ensuring consistency.</p>
</li>
</ul>
<p>Result:<br />Agents that stay <strong>focused, adaptive, and consistent</strong> without drowning in irrelevant context.</p>
<hr />
<h2 id="heading-practical-takeaways">Practical Takeaways</h2>
<ul>
<li><p><strong>Context is the main design problem.</strong> Treat it as first-class architecture.</p>
</li>
<li><p><strong>Documentation becomes prompts.</strong> Maintain them like code.</p>
</li>
<li><p><strong>Token limits force discipline.</strong> Use summarization, focus, and hybrid strategies.</p>
</li>
<li><p><strong>Memory makes agents human-like.</strong> Semantic = facts; episodic = experience.</p>
</li>
<li><p><strong>RAG extends knowledge.</strong> It’s the way to scale beyond what fits in context.</p>
</li>
<li><p><strong>Build it into the platform.</strong> Don’t duct-tape context management after the fact.</p>
</li>
</ul>
<hr />
<h2 id="heading-closing">Closing</h2>
<p>LLMs aren’t limited by their intelligence — they’re limited by their context.<br />Get context right, and everything else becomes easier.</p>
<p>👉 Next in this series: <strong>Security &amp; Trust</strong> — how to keep agents safe, scoped, and governed with the same policies as your web apps.</p>
]]></content:encoded></item><item><title><![CDATA[From APIs to Agent Tools: Designing for Multi-Agent Systems]]></title><description><![CDATA[Quick Overview
In this article, I explore key principles of agent and tool design — and how they integrate with your backend in ways that can make or break your project.
We’ll look at the problem areas that matter most:

Why multi-agent systems → whe...]]></description><link>https://yourenterprisearchitect.com/from-apis-to-agent-tools-designing-for-multi-agent-systems</link><guid isPermaLink="true">https://yourenterprisearchitect.com/from-apis-to-agent-tools-designing-for-multi-agent-systems</guid><category><![CDATA[agentic AI]]></category><category><![CDATA[#agent]]></category><category><![CDATA[agents]]></category><category><![CDATA[Swarm]]></category><category><![CDATA[Multi-Agent Systems (MAS)]]></category><category><![CDATA[multi-agent]]></category><dc:creator><![CDATA[Ruben Rotteveel]]></dc:creator><pubDate>Mon, 15 Sep 2025 16:26:07 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1757968081008/6949f1d9-158f-4fb7-9077-514ad63529fd.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-quick-overview">Quick Overview</h2>
<p>In this article, I explore key principles of <strong>agent and tool design</strong> — and how they integrate with your backend in ways that can make or break your project.</p>
<p>We’ll look at the problem areas that matter most:</p>
<ul>
<li><p><strong>Why multi-agent systems</strong> → when and why specialization helps.</p>
</li>
<li><p><strong>Organizing agent tools and responsibilities</strong> → minimizing context switches and enabling success.</p>
</li>
<li><p><strong>MCP design concepts</strong> → using Model Context Protocol to wrap APIs safely.</p>
</li>
<li><p><strong>Integration patterns</strong> → how to connect agents to your APIs without breaking what already works.</p>
</li>
</ul>
<p>The goal: a clear set of principles you can apply to design agents that are <strong>effective, safe, and production-ready</strong>.</p>
<h2 id="heading-1-why-multi-agent-systems">1. Why Multi-Agent Systems</h2>
<p>One agent with every tool sounds simple. In practice, it quickly collapses under complexity. Multi-agent systems solve this by:</p>
<ul>
<li><p><strong>Specialization</strong> → Agents are scoped to a domain (e.g., Business Analyst, Project Manager, Resource Manager). Narrow context = better focus.</p>
</li>
<li><p><strong>Reduced context switching</strong> → Each agent carries fewer tools and instructions. Smaller prompt = more consistent behavior.</p>
</li>
<li><p><strong>Collaboration</strong> → Agents hand off results or share context where needed.</p>
</li>
</ul>
<p>Think of it as <strong>organizational design for AI</strong>: teams succeed when roles are clear, responsibilities aligned, and overlap limited to backup coverage.</p>
<h3 id="heading-orchestration-supervisor-vs-swarm">Orchestration: Supervisor vs. Swarm</h3>
<p>Multi-agent systems don’t just need specialized roles — they need a way to decide who acts when. Two common orchestration models are:</p>
<ul>
<li><p><strong>Supervisor pattern</strong> → A central coordinator (a “manager agent”) delegates tasks to domain agents and integrates results. Great for hierarchical workflows and strict governance.</p>
</li>
<li><p><strong>Swarm pattern</strong> → Each agents is aware some or all other agents and their expertise, when a task is out of the bounds of the current agent, they’ll hand it off to the agent based on their expertise. The specialist takes over the conversation and you engage with it until it can’t, it then hands off the task to another specialist and so on.</p>
</li>
</ul>
<p>Mezzoic uses a swarm approach. This means:</p>
<ul>
<li><p>Agents are given overlapping tool access to minimize context switching.</p>
</li>
<li><p>Context and workflow orchestration live in the platform, not in a single “boss agent.”</p>
</li>
</ul>
<p>👉 The trade-off: swarms require more careful tool design (clear descriptions, scoped responsibilities, overlapping coverage) so agents don’t collide or wander.</p>
<h2 id="heading-2-organizing-agent-tools-and-responsibilities">2. Organizing Agent Tools and Responsibilities</h2>
<p>Tools are the bridge between your backend and your agents. Organizing them well is critical.</p>
<p><strong>Principles:</strong></p>
<ul>
<li><p><strong>Start from use cases</strong> → Map what the agent must accomplish end-to-end, then derive tools.</p>
</li>
<li><p><strong>Minimize context switches</strong> → Prefer one agent completing a flow over bouncing between agents.</p>
</li>
<li><p><strong>Specialize, but don’t over-partition</strong> → Give agents enough overlap to complete work instead of stalling.</p>
</li>
</ul>
<p><strong>Exclusive domains:</strong><br />Each tool should have a <strong>clear, non-overlapping responsibility</strong>. Ambiguity (two tools that can both edit a quote) leads to failure.</p>
<p>How to define:</p>
<ul>
<li><p><strong>Purpose</strong> → specific outcome it achieves.</p>
</li>
<li><p><strong>Scope</strong> → what it <em>doesn’t</em> do.</p>
</li>
<li><p><strong>Trigger conditions</strong> → when the agent should call it.</p>
</li>
</ul>
<p>👉 <em>Treat tools like team roles: overlap creates confusion, clarity enables success.</em></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1757534752208/f147c01f-28ba-4d6a-8cbb-7bdcecc11c5a.png" alt class="image--center mx-auto" /></p>
<p>Agent to tool mapping diagram:</p>
<p>These agents have a primary set of tools that enable their roles, but they also have access to the same utilities, access to the users details, security modules and any tangentially related tools that they may need.</p>
<p>These tools are thin wrappers around mcp_clients that in turn are proxies to mcp_servers.</p>
<hr />
<h2 id="heading-3-mcp-design-concepts">3. MCP Design Concepts</h2>
<p>What is an MCP server? MCP servers are a standard way to provide tools for your agents. Just like APIs are a standard way to define your core business logic. In a micro services architecture, there’s typically another layer above your microservices, a specialized API Wrapper that implements a set of use cases that use the micro services but is dedicated to an application. These Backend For Frontends BFFs are similar to MCP Servers, or an MCP server is similar to a BFF, it’s a dedicated tool for your agent, that in this case wraps around your micro services.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1757682610384/eb68a2fb-3dc9-409a-ad42-95f0127b20b2.png" alt class="image--center mx-auto" /></p>
<p><strong>MCP servers:</strong></p>
<ul>
<li><p>Wrap existing APIs in a standard interface agents can query at runtime.</p>
</li>
<li><p>Expose available actions (<code>quote.add_line_item</code>, <code>quote.set_delivery_date</code>) instead of generic endpoints.</p>
</li>
<li><p>Provide consistent descriptions that become the agent’s “manual.”</p>
</li>
</ul>
<h3 id="heading-marshal-by-value-vs-marshal-by-reference">Marshal-by-Value vs Marshal-by-Reference</h3>
<p>Let’s briefly talk about Marshal by Reference vs Value. Agents have a hard time constructing complex objects, an LLM doesn’t edit a json object, it regenerates it on the fly and more often than not, gets it wrong. You can solve this by providing a rich set of instructions and describe every field etc, that’s a lot of work, adds to context and it’ll still get it wrong. Instead we design the tools with simple responsibilities and parameters that are easy to undertand.</p>
<ul>
<li><p><strong>Traditional APIs (marshal-by-value):</strong></p>
<ul>
<li><p>Clients (like SPAs) hold a rich domain object (e.g., a Quote).</p>
</li>
<li><p>They apply business logic locally, then send the whole or partial object back (<code>PUT /quote {…}</code>).</p>
</li>
<li><p>Works great with deterministic clients — humans don’t forget required fields.</p>
</li>
</ul>
</li>
<li><p><strong>Agents (marshal-by-reference):</strong></p>
<ul>
<li><p>LLMs are bad at reconstructing complex objects. They often drop fields, mis-format, or overwrite.</p>
</li>
<li><p>Instead, agents do better when they reference an <strong>entity ID</strong> and apply <strong>small, intent-driven changes</strong>.</p>
</li>
<li><p>Example:</p>
<ul>
<li><p><code>quote.set_delivery_date(quote_id, date)</code></p>
</li>
<li><p><code>quote.add_line_item(quote_id, sku, qty, price)</code></p>
</li>
</ul>
</li>
</ul>
</li>
</ul>
<p>Why it matters:</p>
<ul>
<li><p>Smaller prompts → less chance of hallucination.</p>
</li>
<li><p>Narrow scope → fewer invalid states.</p>
</li>
<li><p>Server enforces business rules → no fragile prompt gymnastics.</p>
</li>
</ul>
<p>Yes, it’s <strong>two hops</strong> (fetch + patch), but in agent workflows the LLM’s “thinking time” dwarfs network latency. Reliability wins over raw performance.</p>
<h3 id="heading-tool-design-patterns-recap">Tool Design Patterns (recap)</h3>
<ul>
<li><p><strong>Intent verbs &gt; CRUD</strong> → tools reflect user goals, not table ops.</p>
</li>
<li><p><strong>Patch over PUT</strong> → safe partial updates instead of risky overwrites.</p>
</li>
<li><p><strong>Validation at the edge</strong> → business rules live in MCP, not the LLM.</p>
</li>
<li><p><strong>Idempotency</strong> → every tool call safe to retry.</p>
</li>
</ul>
<h3 id="heading-example">Example</h3>
<p>Instead of:</p>
<pre><code class="lang-json">PUT /quotes/<span class="hljs-number">123</span>
{
  <span class="hljs-attr">"id"</span>: <span class="hljs-string">"123"</span>,
  <span class="hljs-attr">"customer"</span>: { ... },
  <span class="hljs-attr">"lineItems"</span>: [ ... ],
  <span class="hljs-attr">"deliveryDate"</span>: <span class="hljs-string">"2025-09-12"</span>,
  ...
}
</code></pre>
<p>Expose tools like:</p>
<ul>
<li><p>quote.set_delivery_date(quote_id, date)</p>
</li>
<li><p>quote.add_line_item(quote_id, sku, qty, price)</p>
</li>
<li><p>quote.assign_owner(quote_id, user_id)</p>
</li>
</ul>
<p>Each one mutates a small part of the aggregate by reference. The MCP server handles the fetch → patch → persist cycle.</p>
<h2 id="heading-4-integration-patterns-with-apis">4. Integration Patterns with APIs</h2>
<p>The second principle of SOLID — <strong>Open/Closed</strong> — matters here: extend, don’t modify. Keep your backend stable, add MCP Servers as extensions. Don’t break what works.</p>
<p><strong>Patterns:</strong></p>
<ul>
<li><p><strong>Backend-for-Frontend (BFF)</strong> → MCP server as the backend for your agent, wrapping API calls into intent-driven tools.</p>
</li>
<li><p><strong>Sidecar</strong> → MCP deployed alongside a service, tightly coupled to its API.</p>
</li>
<li><p><strong>Gateway</strong> → centralized MCP layer, routing requests to many services.</p>
</li>
</ul>
<p><strong>Security:</strong></p>
<ul>
<li><p>Agents act <em>on behalf of</em> users → OAuth OBO flow.</p>
</li>
<li><p>Tools inherit API-level RBAC/ABAC → no shadow permissions.</p>
</li>
<li><p>Narrow scopes → <code>tool:</code><a target="_blank" href="http://quote.read"><code>quote.read</code></a>, <code>tool:quote.create</code>.</p>
</li>
</ul>
<hr />
<h2 id="heading-5-trade-offs-and-reality-check">5. Trade-offs and Reality Check</h2>
<p><strong>Pros:</strong></p>
<ul>
<li><p>Clearer prompts, fewer invalid states.</p>
</li>
<li><p>Agents behave more predictably.</p>
</li>
<li><p>Security and governance aligned with backend.</p>
</li>
</ul>
<p><strong>Cons:</strong></p>
<ul>
<li><p>More tools to design and maintain.</p>
</li>
<li><p>Extra round trips (fetch + patch).</p>
</li>
<li><p>Need for concurrency control.</p>
</li>
</ul>
<p>But in practice: <strong>reliability &gt; raw RPS</strong>. Agents spend more time “thinking” than calling APIs. Safe, intent-driven tools are worth the overhead.</p>
<hr />
<h2 id="heading-6-practical-takeaways">6. Practical Takeaways</h2>
<ul>
<li><p>APIs don’t translate 1:1 into tools.</p>
</li>
<li><p>Wrap APIs into <strong>intent-driven MCP servers</strong>.</p>
</li>
<li><p>Organize tools with <strong>exclusive domains</strong> and clear responsibilities.</p>
</li>
<li><p>Treat prompts and tool descriptions as <strong>first-class code</strong> — versioned, tested, and owned.</p>
</li>
<li><p>Build agents as <strong>extensions</strong>, not modifications, of your backend.</p>
</li>
</ul>
<hr />
<h2 id="heading-closing">Closing</h2>
<p>APIs power your systems. But agents need something more: <strong>tools</strong> they can discover, understand, and use safely. MCP servers bridge that gap, transforming APIs into intent-driven capabilities that agents can wield reliably.</p>
<p>The takeaway: <strong>don’t let agents call APIs raw</strong>. Wrap them, describe them, test them — and you’ll move from fragile demos to production-ready multi-agent systems.</p>
]]></content:encoded></item></channel></rss>