DevOps & InfrastructureProduction

Deploy Agents to Bedrock, Azure & Google Cloud — One Open-Source CI/CD Pipeline

A hands-on, copy-paste implementation guide. We take a single example agent and ship it — with keyless auth, an eval gate, containers, canaries, and tracing — to Amazon Bedrock AgentCore, Azure AI Foundry, and Vertex AI Agent Engine, using only open-source tooling. Every command, every config file, start to finish.

AgentSwarms Authors

May 30, 2026· 29 min read·—

DevOps & InfrastructureProduction

You prototyped an agent. It works on your laptop. Now your boss wants it in production — on the company's cloud, behind the company's auth, with a deploy story that survives an audit. And the awkward part: nobody on the team agrees on which cloud, or the company runs all three. This guide is the answer we wish existed when we hit that wall: one open-source CI/CD pipeline that takes a single example agent and ships it to Amazon Bedrock AgentCore, Azure AI Foundry, and Google Cloud's Vertex AI Agent Engine — keyless, eval-gated, canaried, and traced. We'll do the first-time setup for each cloud step by step, then unify it into one workflow.

This is the build-it companion to our earlier read, DevOps for Agentic AI: An Open-Source Playbook. That post argued the why and the what — a prompt change is a deploy, gate everything on evals, trace every run. This one is the how, all the way down to the YAML. If you haven't read the playbook, skim it first; we lean on its vocabulary here.

The one idea that makes three-cloud deployment tractable: *build the agent once as a container, and make the cloud the last mile, not the architecture.* Everything before the deploy step — versioning, the eval gate, the image build — is identical across clouds. Only the final deploy command and the IAM trust glue differ. Get that shape right and adding a third cloud is an afternoon, not a rewrite.

What you'll have at the end

A repo with one agent definition, one Dockerfile, one eval suite, and a GitHub Actions workflow whose matrix deploys the same image to AWS, Azure, and GCP — each via short-lived OIDC credentials, each behind a canary, each emitting OpenTelemetry traces to your own Langfuse. Pick one cloud and follow only its section if that's all you need.

The three platforms, decoded

Each major cloud now ships a managed agent runtime — a place to run an agent's reasoning loop without you babysitting servers, wired to that cloud's models, memory, tools, and tracing. They use wildly different names for the same handful of primitives. Here's the Rosetta Stone; flip between clouds and notice the rows never change, only the product names:

Amazon Bedrock AgentCore

⚙️Managed runtimeAgentCore Runtime (serverless, per-session microVM)

🔐Identity & authAgentCore Identity + IAM roles

🧠Memory / stateAgentCore Memory (short + long term)

🔌Tool gatewayAgentCore Gateway (MCP) + Browser/Code tools

📊ObservabilityAgentCore Observability → CloudWatch / X-Ray

📦Container registryAmazon ECR

🚀Deploy toolingagentcore starter toolkit + Terraform / CDK

Same primitives, three vocabularies. Build against the *concepts* on the left and the last-mile swap between clouds stays small.

The same seven primitives across all three clouds. Tab between AWS, Azure, and Google Cloud. Build against the concepts in the left column and the per-cloud differences shrink to a thin adapter.

Amazon Bedrock AgentCore

AgentCore is a set of composable services: Runtime (a serverless, session-isolated place to run any framework — Strands, LangGraph, CrewAI — as a container), Gateway (turns APIs and Lambdas into MCP tools), Identity, Memory, Observability, plus managed Browser and Code-Interpreter tools. You bring a container that speaks a simple HTTP contract; AgentCore runs it, scales it to zero, and isolates each session in its own microVM. It's framework-agnostic by design, which is exactly what you want for a portable pipeline.

Azure AI Foundry Agent Service

Foundry is Microsoft's umbrella for building and running agents. The Agent Service gives you hosted agents with threads, tools (via Connections, Logic Apps, or OpenAPI specs), and knowledge — and for custom code you containerize the agent and run it on Azure Container Apps, which gives you revisions and built-in traffic splitting for canaries. Auth is Microsoft Entra managed identity end to end; tracing flows to Application Insights.

Google Cloud Vertex AI Agent Engine

Vertex AI Agent Engine is a managed runtime for agents you build with the open-source Agent Development Kit (ADK) — though it also accepts LangGraph and others. You hand it your agent object plus a requirements list; it packages, deploys, and gives you sessions and a Memory Bank. For full control you can deploy the same container to Cloud Run instead. Auth is IAM service accounts; from CI you use Workload Identity Federation so no key ever leaves Google.

Don't let the runtime pick your framework

All three accept a plain container that exposes an HTTP endpoint. If you keep your agent framework-neutral and containerized, the managed runtime becomes an implementation detail you can switch — which is the whole point of building the pipeline this way.

The portable pipeline

Before any cloud-specific work, here's the pipeline every deploy flows through. Six of its eight stages are byte-for-byte identical no matter where you ship; only Deploy and the Auth glue change. Click through it:

→

📝 Commitgit + PR review

Agent definition (YAML), prompts, tools, and IaC all land in one repo. A PR is the unit of change.

Commit → build → eval gate → package → keyless auth → deploy → canary → observe. Everything left of 'Deploy' is cloud-agnostic and runs once. That's why a third cloud is cheap to add.

And here's the reference architecture that pipeline produces. The shape is constant — developer → CI with an eval gate → keyless OIDC → registry → managed runtime wired to a model, tools, and tracing. Flip between clouds and only the box labels change:

Dev / CI

developer → git push

→

GitHub Actions
build · eval gate

→

🔑 OIDC token
AWS STS (AssumeRoleWithWebIdentity)

↓ short-lived credentials

☁ Amazon Bedrock AgentCore

📦 Amazon ECR (image @ commit SHA)

↓ deploy / canary

⚙️ AgentCore Runtime

🤖 Bedrock models (Claude, Nova…)

🔌 AgentCore Gateway + Memory

📊 CloudWatch / X-Ray

Notice the shape never changes: dev → CI with an eval gate → keyless OIDC → registry → managed runtime wired to model, tools, and tracing. Only the box labels swap per cloud.

One architecture, three label sets. The dashed boundary is the cloud; everything inside it is managed for you. The arrow from CI into the cloud is a short-lived credential, never a stored key.

First-time foundations (do these once)

Resist the urge to git push straight at a cloud. There are eight foundations you set up once, and they save you from the failure modes that take down first deploys. Tick them off — the widget is honest about which ones carry the most weight:

First-deploy readiness0 / 100 · Don't deploy yet 🚧

These eight are the foundations you set up *once*, before your first real deploy. The two worth the most — keyless OIDC and an eval gate — are also the two teams skip most often.

Your first-deploy readiness score. The two heaviest — keyless OIDC and an eval gate in CI — are also the two most often skipped under deadline pressure. Don't.

1 · One repo, one shape

Everything that defines the agent's behaviour lives in one repository. The agent is a single declarative artifact (a YAML file), the prompt is a versioned file beside it, the eval suite is in the repo, and the cloud wiring is Terraform. Here's the layout we'll build:

refund-triage/
├── agent/
│   ├── refund-triage.v3.yaml      # the agent definition — model, tools, guardrails
│   ├── agent.py                   # framework-neutral entrypoint (HTTP handler)
│   └── prompts/refund-triage.v3.md
├── evals/
│   ├── golden.jsonl               # 80 representative cases with known-good answers
│   └── run.py                     # promptfoo + RAGAS gate, exits non-zero on regression
├── Dockerfile                     # builds ONE image for all three clouds
├── infra/
│   ├── aws/    (ecr.tf, iam-oidc.tf, agentcore.tf)
│   ├── azure/  (acr.bicep, containerapp.bicep, federated-cred.tf)
│   └── gcp/    (artifact-registry.tf, wif.tf, agent-engine.tf)
└── .github/workflows/deploy.yml   # the one pipeline, matrixed over clouds

2 · The agent as a declarative artifact

If a thing can change the agent's behaviour and it isn't in git, you don't have CI/CD — you have folklore. The model, its temperature, the tools, the guardrails: all of it goes in one file with a version number.

# agent/refund-triage.v3.yaml
agent:
  name: refund-triage
  version: 3
  model:
    # 'provider' is resolved per-cloud at deploy time → Bedrock | Foundry | Vertex
    family: claude-sonnet        # mapped to the closest model on each cloud
    temperature: 0.1
    max_tokens: 512
  system_prompt: prompts/refund-triage.v3.md
  tools:
    - read_orders                # read-only on the orders table
    - issue_refund               # behind a human-approval gate
  guardrails:
    pii_redaction: true
    max_iterations: 6            # hard loop cap — your cost circuit-breaker
  budgets:
    usd_per_request: 0.05

3 · One Dockerfile, three destinations

All three runtimes accept a container that listens on a port and answers an invoke request. Build it once; the registry it lands in is the only variable. The contract AgentCore expects is the strictest (a /invocations POST and a /ping health check on port 8080), so we satisfy that and it works everywhere.

# Dockerfile — one image for AWS, Azure, and GCP
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY agent/ ./agent/
# Agent listens on 8080: POST /invocations  +  GET /ping
ENV PORT=8080
EXPOSE 8080
CMD ["python", "-m", "agent.agent"]

4 · The eval gate (cloud-independent)

This is the stage that earns its keep. Before any cloud sees the image, the golden dataset runs in CI. A regression on faithfulness or a budget breach exits non-zero and the deploy never happens. Because it runs against the local container, it's identical for every cloud — write it once.

# evals/run.py — runs in CI, exits 1 on regression (blocks the deploy)
import json, sys
from ragas import evaluate
from ragas.metrics import faithfulness, answer_correctness

cases = [json.loads(l) for l in open("evals/golden.jsonl")]
results = run_agent_over(cases)          # hits the local container
score = evaluate(results, metrics=[faithfulness, answer_correctness])

THRESHOLDS = {"faithfulness": 0.85, "answer_correctness": 0.80}
failed = [m for m, t in THRESHOLDS.items() if score[m] < t]
if failed:
    print(f"❌ Eval gate failed: {failed} below threshold")
    sys.exit(1)
print("✅ Eval gate passed — clear to deploy")

5 · Keyless auth — the foundation that matters most

Never put a long-lived cloud key in a GitHub secret. Instead, your CI job mints a short-lived OIDC token that describes it (this repo, this branch), and the cloud — having been told to trust exactly that — hands back temporary credentials. The key never exists, so it can never leak. Watch the handshake:

🐙 GitHub Actions

☁ Cloud STS

GitHub Actions: Job starts with permissions: id-token: write

GitHub OIDC: Mints a short-lived JWT describing this repo + branch

Cloud STS: Verifies the token against a trust policy you configured

Cloud STS: Returns temporary credentials (minutes, not forever)

Deploy step: Uses the temp creds to push image + deploy the agent

No access key ever leaves the cloud. The same pattern works for all three — AWS AssumeRoleWithWebIdentity, Entra federated credentials, GCP Workload Identity Federation.

Keyless deploys. GitHub mints a scoped, short-lived token; the cloud's STS verifies it against a trust policy and returns minutes-long credentials. Same pattern on all three clouds — only the verb differs.

The one mistake that undoes it all

When you configure the trust policy, scope the subject to a specific repo and branch (e.g. repo:acme/refund-triage:ref:refs/heads/main). A wildcard here means any repo in your org — or worse, any repo on GitHub — could assume your deploy role. Tighten it before you wire anything else.

The worked example

Our example is refund-triage: a customer-support agent that reads an order, decides whether a refund request meets policy, and either drafts an approval (behind a human gate) or a polite denial with the reason. It uses one read-only tool and one gated write tool — small enough to follow, real enough to expose every deployment concern. We'll ship this agent to all three clouds. Pick your cloud below, or read all three.

Deploy to Amazon Bedrock AgentCore

AgentCore Runtime takes your container from ECR and runs it serverlessly. The first-time setup is three things: an ECR repository, an execution role the runtime assumes (to call Bedrock models and write logs), and the GitHub→AWS trust so CI is keyless.

Step 1 — ECR + OIDC trust + execution role (Terraform)

# infra/aws/iam-oidc.tf — trust GitHub Actions, keyless
resource "aws_iam_openid_connect_provider" "github" {
  url             = "https://token.actions.githubusercontent.com"
  client_id_list  = ["sts.amazonaws.com"]
  thumbprint_list = ["1c58a3a8518e8759bf075b76b750d4f2df264fcd"]
}

data "aws_iam_policy_document" "trust" {
  statement {
    actions = ["sts:AssumeRoleWithWebIdentity"]
    principals {
      type        = "Federated"
      identifiers = [aws_iam_openid_connect_provider.github.arn]
    }
    condition {
      test     = "StringEquals"
      variable = "token.actions.githubusercontent.com:aud"
      values   = ["sts.amazonaws.com"]
    }
    condition {
      test     = "StringEquals"
      variable = "token.actions.githubusercontent.com:sub"
      # scope to ONE repo + branch — never a wildcard
      values   = ["repo:acme/refund-triage:ref:refs/heads/main"]
    }
  }
}

resource "aws_iam_role" "ci_deployer" {
  name               = "agentcore-ci-deployer"
  assume_role_policy = data.aws_iam_policy_document.trust.json
}

resource "aws_ecr_repository" "agent" {
  name                 = "refund-triage"
  image_tag_mutability = "IMMUTABLE"   # tags are commit SHAs, never reused
}

Attach a least-privilege policy to ci_deployer — permission to push to this one ECR repo and to call the AgentCore control-plane APIs, nothing more. Separately, create an execution role the runtime itself assumes at request time, scoped to bedrock:InvokeModel on the specific model and logs:PutLogEvents. Two roles, two jobs: one deploys, one runs.

Step 2 — package & launch with the starter toolkit

The open-source bedrock-agentcore-starter-toolkit wraps the build-push-deploy dance. Locally, the very first time, you'd run it interactively to confirm it works before wiring CI:

pip install bedrock-agentcore-starter-toolkit

# Generates the runtime config from your entrypoint + execution role
agentcore configure \
  --entrypoint agent/agent.py \
  --execution-role arn:aws:iam::123456789012:role/refund-triage-exec \
  --name refund-triage

# Builds the container, pushes to ECR, deploys to AgentCore Runtime
agentcore launch

# Smoke-test the deployed agent
agentcore invoke '{"prompt": "Customer wants a refund on order 5512, 40 days late"}'

Step 3 — the AWS deploy job

# .github/workflows/deploy.yml (AWS branch of the matrix)
permissions:
  id-token: write      # REQUIRED for OIDC
  contents: read
steps:
  - uses: actions/checkout@v4
  - uses: aws-actions/configure-aws-credentials@v4
    with:
      role-to-assume: arn:aws:iam::123456789012:role/agentcore-ci-deployer
      aws-region: us-east-1                       # no keys — pure OIDC
  - name: Deploy to AgentCore
    run: |
      pip install bedrock-agentcore-starter-toolkit
      agentcore configure --entrypoint agent/agent.py \
        --execution-role $EXEC_ROLE_ARN --name refund-triage --non-interactive
      agentcore launch --tag $GITHUB_SHA

AgentCore canaries

AgentCore Runtime supports endpoint versions. Deploy the new version as a non-default endpoint, send a slice of traffic, watch CloudWatch + your Langfuse, then promote it to default. That's your canary — see the rollout widget further down.

Deploy to Azure AI Foundry

On Azure we run the container on Azure Container Apps (which gives revisions + traffic splitting for free) and register it as a custom agent in a Foundry project. First-time setup: an Azure Container Registry, an Entra app registration with a federated credential for GitHub, and a resource group.

Step 1 — federated credential (keyless GitHub→Entra)

# One-time: create an app registration and federate it to GitHub
APP_ID=$(az ad app create --display-name refund-triage-ci --query appId -o tsv)
az ad sp create --id $APP_ID

az ad app federated-credential create --id $APP_ID --parameters '{
  "name": "github-main",
  "issuer": "https://token.actions.githubusercontent.com",
  "subject": "repo:acme/refund-triage:ref:refs/heads/main",
  "audiences": ["api://AzureADTokenExchange"]
}'

# Grant the SP least-privilege on the resource group only
az role assignment create --assignee $APP_ID \
  --role "Contributor" --scope /subscriptions/$SUB/resourceGroups/rg-agents

Step 2 — build in ACR and deploy to Container Apps

az acr build builds the image in the cloud (no local Docker needed in CI), and az containerapp up creates or updates the app with a fresh revision. The first manual run confirms the contract before you automate:

# Build the same Dockerfile, in ACR, tagged with the commit
az acr build -r acmeagents -t refund-triage:$GITHUB_SHA .

# Create/update the Container App — system-assigned identity, external ingress
az containerapp up \
  --name refund-triage \
  --resource-group rg-agents \
  --image acmeagents.azurecr.io/refund-triage:$GITHUB_SHA \
  --ingress external --target-port 8080 \
  --system-assigned

# Register the running endpoint as a custom tool/agent in your Foundry project
# (via the azure-ai-projects SDK or the Foundry portal Connections tab)

Step 3 — the Azure deploy job

# .github/workflows/deploy.yml (Azure branch of the matrix)
permissions:
  id-token: write
  contents: read
steps:
  - uses: actions/checkout@v4
  - uses: azure/login@v2
    with:
      client-id: ${{ secrets.AZURE_CLIENT_ID }}        # the app reg, not a key
      tenant-id: ${{ secrets.AZURE_TENANT_ID }}
      subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
  - name: Build & deploy
    run: |
      az acr build -r acmeagents -t refund-triage:$GITHUB_SHA .
      az containerapp up --name refund-triage --resource-group rg-agents \
        --image acmeagents.azurecr.io/refund-triage:$GITHUB_SHA \
        --ingress external --target-port 8080 --system-assigned

Azure canaries are revisions

Container Apps keeps multiple revisions live and lets you split traffic between them by percentage. Deploy v2 with --revision-suffix $GITHUB_SHA, set it to 10% traffic, watch App Insights, then shift to 100%. The rollout widget below maps to this directly.

Deploy to Google Cloud Vertex AI

On GCP you have two clean paths: hand the agent to Agent Engine (fully managed, ADK-native) or run the container on Cloud Run (revisions + traffic tags, like Container Apps). We'll show Agent Engine for the managed path. First-time setup: an Artifact Registry repo, a service account, and Workload Identity Federation so GitHub is keyless.

Step 1 — Workload Identity Federation (keyless GitHub→GCP)

# One-time: a pool + provider that trusts your GitHub repo
gcloud iam workload-identity-pools create github \
  --location=global --display-name="GitHub Actions"

gcloud iam workload-identity-pools providers create-oidc github-provider \
  --location=global --workload-identity-pool=github \
  --issuer-uri="https://token.actions.githubusercontent.com" \
  --attribute-mapping="google.subject=assertion.sub,attribute.repository=assertion.repository" \
  --attribute-condition="assertion.repository=='acme/refund-triage'"

# Let the GitHub identity impersonate a least-privilege deploy SA
gcloud iam service-accounts add-iam-policy-binding \
  agent-deployer@PROJECT.iam.gserviceaccount.com \
  --role=roles/iam.workloadIdentityUser \
  --member="principalSet://iam.googleapis.com/projects/123/locations/global/workloadIdentityPools/github/attribute.repository/acme/refund-triage"

Step 2 — build, then deploy to Agent Engine

# Build the same Dockerfile via Cloud Build → Artifact Registry
gcloud builds submit \
  --tag us-docker.pkg.dev/PROJECT/agents/refund-triage:$GITHUB_SHA

# infra/gcp/deploy_agent_engine.py — managed runtime, ADK-native
import vertexai
from vertexai import agent_engines
from agent.agent import root_agent          # your ADK / LangGraph app object

vertexai.init(
    project="PROJECT", location="us-central1",
    staging_bucket="gs://acme-agent-staging",
)

remote = agent_engines.create(
    agent_engine=root_agent,
    requirements=["google-cloud-aiplatform[agent_engines,adk]"],
    display_name="refund-triage",
)
print("Deployed:", remote.resource_name)   # capture for the canary step

Step 3 — the GCP deploy job

# .github/workflows/deploy.yml (GCP branch of the matrix)
permissions:
  id-token: write
  contents: read
steps:
  - uses: actions/checkout@v4
  - uses: google-github-actions/auth@v2
    with:
      workload_identity_provider: projects/123/locations/global/workloadIdentityPools/github/providers/github-provider
      service_account: agent-deployer@PROJECT.iam.gserviceaccount.com
  - uses: google-github-actions/setup-gcloud@v2
  - name: Build & deploy
    run: |
      gcloud builds submit --tag us-docker.pkg.dev/PROJECT/agents/refund-triage:$GITHUB_SHA
      python infra/gcp/deploy_agent_engine.py

Cloud Run if you want raw control

Prefer gcloud run deploy --no-traffic --tag $GITHUB_SHA then gcloud run services update-traffic --to-tags $GITHUB_SHA=10. That gives you the exact same revision-based canary as Azure Container Apps, with the same container — handy if you want one mental model across Azure and GCP.

One pipeline, three targets

Now stitch it together. The build and eval-gate jobs run once; the deploy job fans out over a matrix of clouds, and each matrix leg uses only its own OIDC login. Crucially, deploy needs the eval gate — a red gate blocks all three clouds at once. That single dependency edge is the most important line in the file.

# .github/workflows/deploy.yml — the whole shape
name: ship-agent
on:
  push: { branches: [main] }

permissions: { id-token: write, contents: read }

jobs:
  build-and-gate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: docker build -t refund-triage:$GITHUB_SHA .
      - name: Eval gate (cloud-independent)
        run: python evals/run.py        # exits 1 on regression → blocks deploy

  deploy:
    needs: build-and-gate               # ← no green gate, no deploy. Anywhere.
    runs-on: ubuntu-latest
    strategy:
      matrix:
        cloud: [aws, azure, gcp]
    steps:
      - uses: actions/checkout@v4
      - name: Deploy to ${{ matrix.cloud }}
        run: ./deploy/${{ matrix.cloud }}.sh $GITHUB_SHA

Why this scales to a fourth cloud

Adding OCI, a private vLLM cluster, or anything else is now: write one deploy/<name>.sh, add one line to the matrix, configure one OIDC trust. The build, the eval gate, the image, and the agent definition don't move. That's the entire payoff of treating the cloud as the last mile.

Progressive rollout, everywhere

A green eval gate means the change is good on your golden set — not that it's good on live traffic. So no deploy goes straight to 100%. Each cloud has a native primitive for shifting a percentage of traffic to the new version; your job is to watch the KPIs at each step and roll back automatically if they dip. Play with the ramp:

Simulate a regression

v1 (stable)100%

v2 (new)0%

All traffic on v1. Deploy v2 to 5% to begin the canary.

Same idea on every cloud: AgentCore endpoint versions, Azure Container Apps revision traffic splits, Cloud Run revision tags. The platform shifts the percentage; your eval + KPI guard decides whether to climb or roll back.

The canary ramp with auto-rollback. Tick 'simulate a regression' then promote — traffic snaps back to v1 the moment KPIs dip. AgentCore endpoint versions, Azure revisions, and Cloud Run traffic tags all implement exactly this.

AWS — deploy the new build as a non-default AgentCore endpoint version; route a slice with your gateway/router; promote to default when CloudWatch + Langfuse stay healthy.
Azure — az containerapp revision with a traffic weight (--traffic-weight latest=10); App Insights watches the canary; shift to 100% or back to 0%.
GCP — gcloud run services update-traffic --to-tags $SHA=10 (Cloud Run), or stage a new Agent Engine version behind your router; Cloud Trace + evals decide promotion.

Observability that spans clouds

If your traces live in three different consoles, you'll debug in none of them. The fix: instrument the agent once with OpenTelemetry (via OpenLLMetry), and fan the spans out to both your own Langfuse — your single pane of glass across all clouds — and each cloud's native backend. Because the instrumentation is in the container, it travels with the image to every target.

# agent/agent.py — instrument once, traces flow everywhere
from traceloop.sdk import Traceloop

Traceloop.init(
    app_name="refund-triage",
    # OTLP endpoint → your own Langfuse, identical on every cloud
    api_endpoint="https://langfuse.acme.dev/api/public/otel",
)
# Each cloud ALSO captures spans natively:
#   AWS   → AgentCore Observability → CloudWatch / X-Ray
#   Azure → Application Insights
#   GCP   → Cloud Trace
# Same OTel spans, two destinations. Debug in Langfuse, audit in the cloud console.

Tag every span with the deployed SHA

Put git.sha, agent.version, and cloud on every span. When a user reports a bad answer, you want to know in one query exactly which image, which prompt version, and which cloud served it. Untagged traces are the difference between a five-minute fix and a five-hour hunt.

Cloud-specific gotchas

Model access isn't automatic. On AWS you must enable a Bedrock model in the region; on Azure you deploy a model to your Foundry project; on GCP you enable the Vertex API and request quota. A perfect pipeline still fails if the model isn't switched on in the region you deploy to.
Cold starts are real. Scale-to-zero runtimes (AgentCore, Cloud Run, Container Apps min-replicas=0) save money but add first-request latency. Set a minimum instance count for latency-sensitive agents.
Region availability differs. The newest agent runtimes and models roll out region by region. Pin your deploy region to one where both the runtime and your chosen model are GA — don't assume us-east-1 parity across clouds.
Egress and data residency. Calling your own Langfuse or a third-party tool from inside the runtime crosses a network boundary. Check egress rules and, for regulated data, that the runtime region matches your residency requirements.
Least-privilege is per-cloud. An over-broad Contributor, roles/editor, or * IAM policy is the most common audit finding. Scope the deploy identity to the one registry + one runtime it touches, and the execution identity to the one model it calls.
Immutable image tags. Tag images with the commit SHA and never reuse a tag. latest is how you ship a different artifact than the one you evaluated.

Your first deploy, in an afternoon

You don't need all three clouds on day one. Here's the honest minimum path to one real, gated, traced deploy:

1Hour 1 — Lay out the repo: agent YAML, agent.py, Dockerfile, and 20 golden cases (you'll grow them later). Get the container running and answering locally.
2Hour 2 — Pick one cloud. Set up its registry, the OIDC trust (scoped to your repo + branch), and a least-privilege deploy role. This is the part that feels slow and is worth every minute.
3Hour 3 — Write evals/run.py and the single-cloud deploy job. Push to a branch, watch the gate run, then watch the deploy. Smoke-test the live agent.
4Hour 4 — Add OpenLLMetry, point it at Langfuse, tag spans with the SHA, and wire one cost/latency alert. Now you can see what you shipped.
5Next week — Add the canary step, grow the golden set from real traffic, and only then fan the matrix out to a second cloud. The second one takes an hour because everything but the deploy script already exists.

How this relates to the vendor playbooks

Microsoft, AWS, and Google each publish their own CI/CD-for-agents guidance, and they're worth reading — they frame the problem the same way we do: an agent is a deployable artifact that needs versioning, an eval gate, and staged rollout. Where the vendor guides lean on Foundry + Azure DevOps + Bicep, or CodePipeline + CDK, or Cloud Build + Vertex pipelines, this guide deliberately keeps the pipeline open-source and portable — git, GitHub Actions, Docker, promptfoo, RAGAS, Langfuse, OpenLLMetry, Terraform — so the only thing that's cloud-specific is the last-mile deploy. Read the vendor docs for the deepest per-cloud features; use this shape so you're never locked into one.

The shortest possible summary

Build the agent once as a container. Gate every change on a cloud-independent eval suite. Authenticate with keyless OIDC, never a stored key. Push the SHA-tagged image to the cloud's registry, deploy to its managed runtime, canary before 100%, and emit one set of OpenTelemetry spans to your own Langfuse. The cloud is the last mile — keep it that way and three targets cost barely more than one.

Deploying agents to Bedrock, Azure, and Vertex isn't three projects — it's one pipeline with three thin adapters. The teams shipping reliably across clouds in 2026 aren't writing three pipelines; they're writing one, gating it hard, and letting the matrix do the fan-out. Start with one cloud this afternoon, get the gate and the trace green, and the second and third clouds will feel like a formality.

Comments

Loading comments…

Deploy Agents to Bedrock, Azure & Google Cloud — One Open-Source CI/CD Pipeline

The three platforms, decoded

Amazon Bedrock AgentCore

Azure AI Foundry Agent Service

Google Cloud Vertex AI Agent Engine

The portable pipeline

First-time foundations (do these once)

1 · One repo, one shape

2 · The agent as a declarative artifact

3 · One Dockerfile, three destinations

4 · The eval gate (cloud-independent)

5 · Keyless auth — the foundation that matters most

The worked example

Deploy to Amazon Bedrock AgentCore

Step 1 — ECR + OIDC trust + execution role (Terraform)

Step 2 — package & launch with the starter toolkit

Step 3 — the AWS deploy job

Deploy to Azure AI Foundry

Step 1 — federated credential (keyless GitHub→Entra)

Step 2 — build in ACR and deploy to Container Apps

Step 3 — the Azure deploy job

Deploy to Google Cloud Vertex AI

Step 1 — Workload Identity Federation (keyless GitHub→GCP)

Step 2 — build, then deploy to Agent Engine

Step 3 — the GCP deploy job

One pipeline, three targets

Progressive rollout, everywhere

Observability that spans clouds

Cloud-specific gotchas

Your first deploy, in an afternoon

How this relates to the vendor playbooks

Further reading & references

Comments