SecurityProductionAWSAzureGCP

Securing Agentic AI: A Layered Defense Playbook

Agents aren't chatbots with extra steps — they read untrusted text, hold credentials, call tools, write to memory, and reach the public internet. Securing one means securing seven layers at once. Here's how to do it, with reference architectures for AWS Bedrock AgentCore, Azure AI Foundry Agents, and Gemini Enterprise / Vertex Agent Engine, plus the open-source stack that fills the gaps.

AgentSwarms Authors

June 9, 2026· 22 min read·—

SecurityProductionAWSAzure

A 2024-era chatbot had one attack surface: the prompt. A 2026-era agent has at least seven. It authenticates as a workload identity, reads documents an attacker may have written, decides which tools to call, mutates a long-lived memory store, talks to external APIs, runs sandboxed code, and leaves an audit trail you'll either trust in court or won't. Every one of those is a separate trust boundary, and they fail in ways the classic AppSec playbook doesn't cover.

This post is the playbook we wish we'd had on day one of shipping agents to enterprises. We'll move top-down through the layers, name the threats at each, list the controls that close them, then translate the abstract picture into concrete reference architectures on the three platforms most readers are deploying on: AWS Bedrock AgentCore, Azure AI Foundry Agent Service, and Google Vertex AI Agent Engine / Gemini Enterprise. We'll end with the open-source and third-party stack that picks up where managed services stop.

How to read this post

If you ship to one cloud, skim the other two — the patterns translate. If you're an early-stage team, jump to the Layered defense section and the Open-source stack at the end. If you're an enterprise architect, the reference diagrams are designed to drop into a threat model.

Why agents are a new attack surface

Three properties make agentic systems different — and harder — from a security standpoint:

They mix trust levels in the same context window. A system prompt (trusted), a user message (semi-trusted), retrieved documents (often untrusted), tool outputs (untrusted), and conversation memory (variable) all become one flat string the model reasons over. The model has no built-in mechanism to keep them apart.
They hold credentials and act on them. Unlike a stateless chatbot, an agent calls APIs, writes to databases, sends email, executes code. A successful injection isn't just a wrong answer — it's an unauthorized transaction.
They learn, remember, and self-modify. Long-term memory, skills, sub-agent spawning and self-evaluation mean today's safe agent can quietly drift into tomorrow's compromised one without a code change.

Simon Willison crystallized the worst-case as the lethal trifecta: any agent that simultaneously has (1) access to private data, (2) exposure to untrusted content, and (3) the ability to communicate externally can be turned into a data-exfiltration tool by a single well-crafted document. The whole layered model below is, in essence, a discipline for never letting all three line up at once — or, if they must, ringing them with so many tripwires the attack still fails.

We covered this from the architecture side in Production System Design for Agentic AI and from the failure-mode side in 7 Failure Modes That Kill Multi-Agent Systems. This post is the security-first companion.

The seven layers of agent security

Pick a layer and the diagram below shows the dominant threats and the controls that earn their keep. Each subsequent section drills into a layer in detail.

Threats

• Prompt injection (direct + indirect)
• Jailbreaks
• Data poisoning via RAG sources

Controls

• Input classifiers / tripwires
• Trust tagging of sources
• Strip HTML / hidden unicode / instructions in retrieved docs

Click any of the seven layers. Each pairs the threats that show up there with the cheapest controls that close most of them. Skipping a layer is rarely free; the threat just resurfaces one tier up.

Mapping to OWASP LLM Top 10 (2025)

If you owe an auditor a checklist, the OWASP LLM Top 10 is the lingua franca. Here's how the layered controls map onto it. Hover a row to highlight it.

OWASP	Threat	Primary control
LLM01	Prompt Injection	Input guardrail + trust tags + structured tool schemas
LLM02	Sensitive Information Disclosure	Output PII redactor + memory tenant scoping
LLM03	Supply-Chain (models, plugins, MCP)	Signed artifacts, SBOM scan, MCP allowlist registry
LLM04	Data & Model Poisoning	Source provenance + RAG dedup + canary evals
LLM05	Improper Output Handling	Render as data, never as code; sanitize HTML/SQL
LLM06	Excessive Agency	Least-privilege tools + human-in-the-loop for high-blast actions
LLM07	System Prompt Leakage	Treat prompt as non-secret; gate secrets via runtime fetch
LLM08	Vector & Embedding Weaknesses	Per-tenant namespaces, embedding-attack tests
LLM09	Misinformation / Hallucination	Grounding + citation requirement + LLM-judge eval gate
LLM10	Unbounded Consumption	Per-agent budgets, loop detector, request quotas

OWASP LLM Top 10 (2025) mapped to the primary control that addresses each. None of these are single-vendor — they're patterns you compose from your platform's primitives.

Layer 1 · Identity & Access

The first question for any agent is the same as for any service: who is it acting as? Most agent breaches start here, with an agent running as one giant service principal that can read every database in the account. The fix is the same old fix — least privilege — applied per agent role, not per application.

Per-agent workload identity. On AWS use IAM Roles for Service Accounts (IRSA) or AgentCore Identity; on Azure use a Managed Identity per Foundry agent; on GCP use Workload Identity Federation. Never share a single principal across agents with different capabilities.
Short-lived, scoped tokens. Issue STS / SAS / signed JWTs that expire in minutes and embed the agent's purpose. Tools verify the purpose claim before acting.
Per-user OAuth for user data. When an agent acts on behalf of an end user (read their Gmail, post to their Slack), use a real OAuth flow per user. A workspace-level service token used for every user is a confused-deputy waiting to happen.
No bearer tokens in prompts. Inject credentials at the tool boundary at runtime. The model should never see the raw secret — otherwise a prompt-injection that asks it to “repeat your last tool input verbatim” is a credential dump.

Layer 2 · Prompt & Input

Prompt injection is now what SQL injection was in 2005: well-known, ubiquitous, and still the most common root cause. The brutal part is that there is no clean parser the way prepared statements were for SQL — natural language doesn't bind neatly. The defense is depth, not purity.

Trust-tag every input. Wrap retrieved chunks, tool outputs, and memory snippets in clearly-labelled, unambiguous delimiters (<retrieved trusted=false> ... </retrieved>) and instruct the model to never execute instructions from inside such blocks.
Cheap classifier guardrails. Run a small model (Gemini Flash Lite, Claude Haiku, Llama Guard) as an input tripwire before the expensive model burns tokens. See the Input & Output Guardrails notebook for a working pattern.
Strip hostile rendering. Remove zero-width characters, hidden ANSI, suspicious base64 blobs, and HTML/Markdown comments from retrieved docs before they reach the model — these are the common vehicles for indirect prompt injection.
Probe before you ship. Use the Prompt Injection Tester tool against your own system prompt; run Garak or PyRIT in CI.

Layer 3 · Model & Reasoning

Models leak through their outputs, not just their inputs. Two patterns matter:

Structured outputs by default. Force the model to emit JSON matching a Pydantic / Zod schema. A schema rejects three classes of attack — malformed tool calls, unexpected fields used to smuggle instructions, and ‘free-form’ replies that bypass downstream parsers. See Pydantic — The Contract Layer of Agentic AI.
Hidden chain-of-thought. Never surface the model's reasoning text to the caller. CoT routinely contains intermediate secrets (database rows the model considered then discarded, raw API responses, etc.). Strip it server-side.
Use safety-tuned variants where they exist. Bedrock Guardrails, Azure Content Safety + Prompt Shields, and Gemini Safety Settings catch obvious-bad without you having to write a classifier.
Pin model versions. Don't let a silent upgrade of gpt-4-latest revert your jailbreak fixes. Pin per environment and ship version bumps through the same eval gate as code.

Layer 4 · Tools / MCP

Tools are where intent becomes action. Three things separate a well-secured tool layer from a disaster waiting to happen:

Capability scoping per agent role. A summarizer agent doesn't get the send_email tool, period. Don't pass “all available tools” into every agent; the LLM will eventually find a creative use for the one you forgot to remove.
Tool-broker as policy point. Put a thin server between the agent and the tool that re-validates inputs against a schema, checks the calling agent's identity, applies per-tool rate limits, and writes an audit row. The model can lie about its intent; the broker can't be talked out of its checks.
MCP servers behind an allowlist registry. The MCP ecosystem is exploding and packages get yanked, replaced, or quietly compromised. Maintain an internal registry of pinned, signed MCP servers — see MCP Production Playbook 2026.
Human-in-the-loop for high-blast actions. Any action that costs money, sends a customer message, or mutates production data should pause for explicit approval. Build it into the agent loop from day one; bolting it on later is expensive.

The lethal trifecta

If your agent can read sensitive data, can be exposed to untrusted text, and can talk to the outside world — and you cannot remove one of those legs — assume an exfiltration channel exists. Add an egress proxy, an output guardrail that scans for known-secret patterns, and rate-limit external calls. The architecture in Production System Design walks through breaking the trifecta in detail.

Layer 5 · Memory & Data

Long-term memory is what makes agents useful and what makes them dangerous. A poisoned memory entry written on Monday silently steers every conversation that week. A multi-tenant agent that mixes one customer's notes into another's response is a breach with regulatory consequences.

Tenant-scoped namespaces. Every memory write/read passes through a tenant ID; the vector store enforces partition isolation, not the application code.
Provenance on every memory. Store who wrote this, when, from which session. When something looks off, you can trace the source — and revoke its successors.
Encrypt at rest with CMK. KMS / Key Vault / CMEK with customer-managed keys gives you a kill switch. Drop the key, the data is unreadable, even by the platform.
RAG source vetting. Treat indexed documents as part of your supply chain. Hash content, watch for unexpected diffs, and apply doc-staleness controls so the index doesn't drift into a poisoned state without you noticing.

Layer 6 · Network & Runtime

Code that an LLM generated and an LLM decided to run is, by definition, not something you reviewed. Run it like you'd run user-supplied code: in a sandbox, in a private network, with egress controls.

Sandboxed code execution. Bedrock AgentCore's Code Interpreter, Vertex's sandboxed exec, or self-hosted E2B / Firecracker / gVisor microVMs. No persistent filesystem, no network unless explicitly enabled, hard CPU and wall-clock limits.
Egress allowlist. Force all outbound calls through a proxy that whitelists destinations. An injection that tries to POST a secret to attacker.example.com should fail at the network, not at the model.
Signed images + SBOM scanning. Sign every container with Cosign, scan with Snyk / Trivy on every build, refuse to deploy unsigned or critical-CVE images.
Private VPC, no public ingress to internals. Tools, memory stores, and vector DBs live in private subnets. The only public surface is the agent's API gateway.

Layer 7 · Observability & Governance

If you can't see what your agents did, you can't prove they did it correctly — and you can't catch the day they stop. Observability is the layer that makes every other control enforceable.

Trace every step. OpenTelemetry GenAI conventions are now stable; emit one span per model call, per tool call, per guardrail decision. Hash inputs/outputs so traces are searchable without dumping PII into logs.
Tamper-evident audit log. For regulated workloads, write tool-call audit rows to an append-only store (S3 Object Lock, Azure immutable blob, GCS bucket lock).
Continuous eval + red-team in CI. Every prompt or tool change goes through an eval suite that includes injection attempts and known jailbreaks. Block the deploy if quality or safety regresses.
Per-agent budgets + anomaly alerts. Cost spikes are often the first signal of a runaway loop or a compromise. See Cost Control in Multi-Agent Systems.

👤User

→

🛡️WAF / Auth

→

🚦Input Guardrail

→

🤖Agent Loop

→

🧱Tool Broker

→

🚪Egress Proxy

→

🔍Output Guardrail

→

📜Audit Log

Every request crosses 8 trust boundaries. Each one can refuse, redact, or downgrade. Remove any single layer and the blast radius of a successful injection grows by an order of magnitude.

A single request, eight trust boundaries. The pulsing highlight shows where the request currently is. Any single layer can refuse, redact, or downgrade — that's what makes defense-in-depth survive a single mistake.

Reference architecture · AWS Bedrock AgentCore

Bedrock AgentCore (GA late 2025) is AWS's purpose-built agent runtime: session-isolated microVMs, an Identity service, a managed Memory store, a Gateway that exposes tools over MCP, and built-in Observability. It is opinionated about isolation, which is good for security.

A production-shape Bedrock AgentCore deployment, grouped by the four security domains: identity, runtime, data/tools, observability. Each block is a real AWS primitive — none of this is bespoke.

Minimal IaC sketch

# Terraform — production-shape AgentCore agent
resource "aws_iam_role" "agent_runtime" {
  name = "swarm-support-agent"
  assume_role_policy = data.aws_iam_policy_document.bedrock_trust.json
}

# Per-agent role, scoped to ONE knowledge base + ONE Lambda tool
data "aws_iam_policy_document" "agent_perms" {
  statement {
    actions   = ["bedrock:Retrieve", "bedrock:InvokeModel"]
    resources = [aws_bedrockagent_knowledge_base.support.arn, var.model_arn]
  }
  statement {
    actions   = ["lambda:InvokeFunction"]
    resources = [aws_lambda_function.crm_lookup.arn]
  }
}

resource "aws_bedrock_guardrail" "support" {
  name = "support-guardrail"
  topic_policy_config {
    topics_config { name = "competitors" type = "DENY" }
  }
  sensitive_information_policy_config {
    pii_entities_config { type = "EMAIL" action = "BLOCK" }
    pii_entities_config { type = "CREDIT_DEBIT_CARD_NUMBER" action = "BLOCK" }
  }
  contextual_grounding_policy_config {
    filters_config { type = "GROUNDING" threshold = 0.7 }
  }
}

# Code interpreter / browser run in isolated microVMs by default —
# session isolation is enforced by AgentCore, not by your code.

AgentCore-specific wins

Session isolation is enforced by the runtime, not by your application code — a single agent process never sees two sessions' state. Bedrock Guardrails apply both to the model output AND to retrieved context (contextual grounding). Use both. The integration with CloudTrail gives you a no-extra-work audit log.

Reference architecture · Azure AI Foundry Agents

Azure AI Foundry Agent Service is Microsoft's hosted agent runtime, paired with Entra-based identity, Content Safety (including Prompt Shields), and tight integration into Azure AI Search and the broader Azure data plane. If your data lives in Microsoft 365 or Azure SQL, Foundry's per-user On-Behalf-Of (OBO) flow is the cleanest path to per-user ACLs.

An Azure AI Foundry agent deployed inside a VNet with Private Endpoints, Entra-managed identity, Prompt Shields, and Application Insights tracing. Note the OBO flow for any tool that touches user data.

# Azure AI Foundry — agent with Content Safety + Prompt Shields enabled
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential

project = AIProjectClient.from_connection_string(
    credential=DefaultAzureCredential(),  # managed identity, NOT a key
    conn_str=os.environ["FOUNDRY_CONN"],
)

agent = project.agents.create_agent(
    model="gpt-4o-2025-04",
    name="support-agent",
    instructions=SYSTEM_PROMPT,                     # never contains secrets
    tools=[search_tool, ticket_tool],               # least-privilege tool set
    content_safety={                                # Prompt Shields ON
        "prompt_shield": {"mode": "block"},
        "protected_material": {"mode": "block"},
        "groundedness": {"mode": "warn", "threshold": 0.75},
    },
    tracing_enabled=True,                           # App Insights
)

# Tools that touch USER data use OBO so RBAC is enforced as that user,
# not as the agent's managed identity.
search_tool = AzureAISearchTool(
    index="kb-prod",
    on_behalf_of=user_token,
)

Foundry-specific wins

Prompt Shields catches direct + indirect injection inline. Connected Agents over A2A let you keep specialist agents in separate Foundry projects with independent permissions, instead of one mega-agent that holds every capability. Defender for Cloud surfaces agent-specific findings (over-permissive identity, missing Content Safety) without extra wiring.

Reference architecture · Gemini Enterprise / Vertex Agent Engine

Google's stack splits across two products: Vertex AI Agent Engine (a managed runtime for agents you build with ADK, LangChain, or LangGraph) and Gemini Enterprise (a search + assistant layer over your connected data sources, with per-user ACL filtering). Both sit inside VPC Service Controls and benefit from Google's CMEK, IAM, and Model Armor primitives.

Vertex Agent Engine + Gemini Enterprise inside a VPC-SC perimeter. Per-user OAuth and Discovery Engine ACL filtering give you row-level access control on retrieval — uncommon outside this stack.

# Vertex Agent Engine — deploy an ADK agent with Model Armor + safety settings
from vertexai import agent_engines
from google.adk.agents import Agent
from google.adk.tools import google_search

agent = Agent(
    name="research-agent",
    model="gemini-2.5-pro",
    instructions=SYSTEM_PROMPT,
    tools=[google_search],
    safety_settings=STRICT_SAFETY,         # block HARM_CATEGORY_*
)

deployed = agent_engines.create(
    agent_engine=agent,
    display_name="research-prod",
    service_account="research-agent@proj.iam.gserviceaccount.com",  # least-priv SA
    # Model Armor scans BOTH inbound prompts and outbound responses
    model_armor={"prompt_template_id": "armor-prod-strict"},
    # CMEK + VPC-SC inherited from the project perimeter
)

# Gemini Enterprise streamAssist — per-user ACLs enforced by Discovery Engine
# so the assistant only retrieves docs the END USER can already see.
# Auth = the END USER's Google OAuth access token (not a service-account token),
# so Discovery Engine can filter results by that user's Drive / Workspace ACLs.
response = requests.post(
    f"https://discoveryengine.googleapis.com/v1alpha/{assistant}:streamAssist",
    headers={"Authorization": f"Bearer {user_google_access_token}"},
    json={"query": {"text": question},
          "toolsSpec": {"vertexAiSearchSpec": {}}},
)

GCP-specific wins

VPC Service Controls is the strongest data-exfiltration boundary of the three clouds — it blocks even authenticated API calls that would move data outside your perimeter. Discovery Engine ACL inheritance means a Gemini Enterprise assistant cannot return a document the calling user couldn't already open in Drive. That's per-row authorization for free.

Open-source & 3rd-party stack

Managed services give you a strong baseline, but real production agents lean on open-source and third-party tools for the parts the platforms don't cover well — model-aware red-teaming, runtime AI firewalls, multi-cloud observability, deeper sandboxing.

Guardrails

• NVIDIA NeMo Guardrails
• Guardrails AI
• Llama Guard 3 / Prompt Guard
• Lakera Guard

Red-team / Eval

• Garak (NVIDIA)
• PyRIT (Microsoft)
• promptfoo
• Giskard LLM scan

Runtime / Firewall

• Protect AI Layer
• Robust Intelligence AI Firewall
• Cloudflare Firewall for AI
• HiddenLayer AISec

Observability + Tracing

• OpenTelemetry GenAI
• Langfuse
• Arize Phoenix
• Helicone

Sandboxing

• E2B / Firecracker
• gVisor
• Daytona Sandboxes
• Modal sandboxes

Supply chain

• Sigstore / Cosign
• Snyk + Dependabot
• Protect AI ModelScan
• Anchore SBOM

The 2026 agent-security tooling landscape, grouped by the layer they harden. Most production teams use 2–4 of these alongside their cloud provider's primitives.

What we recommend by maturity stage

1Day 1 (prototype): Add a cheap input guardrail (Llama Guard 3 or a Flash-Lite classifier). Wire OpenTelemetry GenAI traces to Langfuse or Arize Phoenix. Use Pydantic / Zod for every tool input.
2First production deploy: Add an output guardrail with Guardrails AI or NeMo Guardrails. Move code execution into E2B / Firecracker microVMs. Stand up an egress proxy with a small allowlist.
3Scaling out: Add red-team in CI with Garak or PyRIT. Sign artifacts with Sigstore / Cosign. Run an AI Firewall (Protect AI, Lakera, HiddenLayer) at the edge.
4Regulated / enterprise: Add tamper-evident audit logs, per-tenant CMK, formal red-team via Microsoft PyRIT or NVIDIA AI Red Team, and continuous evaluations as deploy gates.

Try the controls inside AgentSwarms

Most of the patterns above are runnable in the AgentSwarms notebook lab: the Guardrails (Tripwires) notebook builds an OpenAI-Agents-style input/output guard, the PII Sanitizer notebook is a working middleware shim, and the Failure Modes lab reproduces lethal-trifecta exfil and lets you patch it. The Prompt Injection Tester tool runs a battery of known-bad inputs against any system prompt in your browser.

A checklist you can take to a threat-modelling session

Every agent has its own workload identity and a least-privilege policy.
All retrieved / tool / memory content is wrapped in trust-tagged delimiters.
An input guardrail runs before the expensive model on every call.
Tool inputs validated against a strict schema by a tool broker, not by the model.
Egress is allowlisted; no agent can POST to an arbitrary domain.
Code execution is sandboxed; sandboxes have no persistent storage.
Memory and vector stores are tenant-partitioned at the storage layer.
An output guardrail scrubs PII / known secrets before the response leaves.
Every model call, tool call, and guardrail decision emits an OTel span.
Audit log is append-only and survives a malicious deletion attempt.
Red-team and eval suites run in CI; deploys block on regression.
Per-agent cost / latency / refusal-rate alerts are wired to on-call.

If you can answer yes to every line on a given agent, you're ahead of the median enterprise deployment in 2026. If you can't — that's your roadmap.

Going deeper

Security is a layer of every other concern, not a separate concern. The posts in the related list go one level deeper into each of the patterns we touched here — production architecture, failure modes, MCP, RAG freshness, cost control. The Explore section links to the tools in AgentSwarms you can use to validate each control on your own agent today.

And if you're hiring for or interviewing into a senior agentic-AI role, Agentic AI Interview Questions 2026 now leads with security questions. That's not a coincidence — it's how the market is pricing this discipline.

Comments

Loading comments…