All posts
MCPCloud DeploymentTutorial

Deploy a Custom MCP Server to AWS, Azure, and GCP: A Runnable Python & TypeScript Guide

Not another “what is MCP” explainer. This is the copy-paste cookbook: build a custom MCP server in Python and in TypeScript, containerize it once, and ship it to Google Cloud Run, Azure Container Apps, and Amazon ECS Express Mode — with commands that actually run.

AS
AgentSwarms Authors
July 5, 2026· 24 min read·
MCPCloud DeploymentTutorial

There is no shortage of writing on what the Model Context Protocol is. What's strangely hard to find is the thing most people actually get stuck on: how do you take a custom MCP server off your laptop and run it in the cloud, reachable by real agents, in a way that doesn't fall over the first time two requests arrive at once? That's the entire subject of this post. We'll build the same server twice — once in Python, once in TypeScript — put it in a container, and deploy it three ways: Google Cloud Run, Azure Container Apps, and AWS via Amazon ECS Express Mode. Every command here is meant to be run, not admired.

To keep this focused, we're deliberately not re-covering ground. If you want the concepts — what tools, resources, and prompts are, and how to design a tool an agent can actually use — read Building MCP Servers That Agents Can Actually Use. If you want the shipping-it-safely discipline and a production checklist, read The MCP Server You Actually Ship. This guide is the cloud deployment cookbook those two point at.

The one decision that determines everything: transport

Before a single line of code, you have to pick a transport — how the client and your server talk. This is the decision that quietly decides whether your server can live in the cloud at all. There are two that matter.

stdio runs your server as a local subprocess: the client launches it and they speak over standard input and output. It's brilliant for local development and Claude Desktop, and completely unreachable over a network. Streamable HTTP runs your server as a long-lived web service with a single /mcp endpoint that accepts POST requests and can stream responses back over Server-Sent Events. That is the transport you deploy. Everything in this guide targets it.

Use
stdio

The client launches your server as a subprocess and talks over stdin/stdout. No networking, no auth — perfect on your laptop, useless across a network.

Pick your scenario. If the answer isn't “on my laptop,” you want Streamable HTTP — and for anything autoscaled, the stateless flavor of it.

The server, in Python (FastMCP)

The official Python SDK ships FastMCP, a high-level API that turns a plain function into an MCP tool with a decorator. Here's a complete server that exposes two tools and serves them over Streamable HTTP. Note the two things that make it cloud-ready: it binds to 0.0.0.0 and it reads the port from the environment — cloud runtimes tell your container which port to listen on.

# server.py — a custom MCP server exposed over Streamable HTTP.
import os
from mcp.server.fastmcp import FastMCP
from starlette.requests import Request
from starlette.responses import PlainTextResponse

mcp = FastMCP(
    "weather-mcp",
    host=os.environ.get("HOST", "0.0.0.0"),
    port=int(os.environ.get("PORT", "8080")),
)

@mcp.tool()
def get_forecast(city: str) -> str:
    """Return a short weather forecast for a city."""
    # Swap this for a real API call, e.g. httpx.get(...).
    return f"Forecast for {city}: 24°C, clear skies."

@mcp.tool()
def convert_temp(celsius: float) -> str:
    """Convert a Celsius temperature to Fahrenheit."""
    return f"{celsius}°C = {celsius * 9 / 5 + 32:.1f}°F"

@mcp.custom_route("/health", methods=["GET"])
async def health(_request: Request) -> PlainTextResponse:
    """Health check endpoint for the cloud platform."""
    return PlainTextResponse("ok")

if __name__ == "__main__":
    # Serves MCP at http://<host>:<port>/mcp — plus /health for the platform.
    mcp.run(transport="streamable-http")
# requirements.txt
mcp>=1.9.0

Run it locally in two commands. It comes up on http://localhost:8080/mcp.

pip install -r requirements.txt
python server.py

The same server, in TypeScript (MCP SDK)

The TypeScript SDK gives you McpServer plus a StreamableHTTPServerTransport you mount on any HTTP framework. We'll use Express. The single most important choice here is to run stateless: build a fresh server and transport for every request. That one decision is what lets the same code scale across many container instances later — we'll come back to why it matters.

{
  "name": "weather-mcp",
  "version": "1.0.0",
  "type": "module",
  "scripts": {
    "build": "tsc",
    "start": "node dist/server.js",
    "dev": "tsx server.ts"
  },
  "dependencies": {
    "@modelcontextprotocol/sdk": "^1.12.0",
    "express": "^5.0.0",
    "zod": "^3.23.8"
  },
  "devDependencies": {
    "@types/express": "^5.0.0",
    "@types/node": "^22.0.0",
    "tsx": "^4.19.0",
    "typescript": "^5.6.0"
  }
}
// tsconfig.json
{
  "compilerOptions": {
    "target": "ES2022",
    "module": "NodeNext",
    "moduleResolution": "NodeNext",
    "outDir": "dist",
    "strict": true,
    "esModuleInterop": true,
    "skipLibCheck": true
  },
  "include": ["server.ts"]
}
// server.ts — a custom MCP server over Streamable HTTP (stateless).
import express, { type Request, type Response } from "express";
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
import { z } from "zod";

function buildServer(): McpServer {
  const server = new McpServer({ name: "weather-mcp", version: "1.0.0" });

  server.registerTool(
    "get_forecast",
    {
      title: "Get forecast",
      description: "Return a short weather forecast for a city.",
      inputSchema: { city: z.string() },
    },
    async ({ city }) => ({
      content: [{ type: "text", text: `Forecast for ${city}: 24°C, clear skies.` }],
    }),
  );

  return server;
}

const app = express();
app.use(express.json());

// Health check so the platform knows the container is alive.
app.get("/health", (_req: Request, res: Response) => {
  res.json({ status: "ok" });
});

// One fresh server + transport per request = stateless = horizontally scalable.
app.post("/mcp", async (req: Request, res: Response) => {
  const server = buildServer();
  const transport = new StreamableHTTPServerTransport({ sessionIdGenerator: undefined });
  res.on("close", () => {
    transport.close();
    server.close();
  });
  await server.connect(transport);
  await transport.handleRequest(req, res, req.body);
});

const port = Number(process.env.PORT ?? 8080);
app.listen(port, () => console.log(`MCP server on :${port}/mcp`));

Install and run it the same way — npm install then npm run dev (or npm run build && npm start for the compiled version). Whichever language you chose, you now have an identical MCP endpoint. The parity is the point:

@mcp.tool()
def get_forecast(city: str) -> str:
    """Return a short weather forecast."""
    return f"Forecast for {city}: 24°C, clear."

Same tool, same shape: a name, a described input schema, and a handler that returns content.

The same tool, both languages. Whichever you pick, an MCP tool is a name, a described input schema, and a handler that returns content.

Test it before you ship it

Two ways to check the server works. The friendly way is the MCP Inspector, an official web UI that connects to your endpoint and lets you list and call tools by hand:

# Launches a local web UI; point it at http://localhost:8080/mcp
npx @modelcontextprotocol/inspector

The bare-metal way is to send the initialize handshake yourself with curl. This is also the fastest way to smoke-test a deployed server — just swap the URL:

curl -s http://localhost:8080/mcp \
  -H "Content-Type: application/json" \
  -H "Accept: application/json, text/event-stream" \
  -d '{"jsonrpc":"2.0","id":1,"method":"initialize",
       "params":{"protocolVersion":"2025-06-18","capabilities":{},
                 "clientInfo":{"name":"curl","version":"0"}}}'
The Accept-header gotcha that trips everyone

Streamable HTTP requires the client to accept BOTH application/json and text/event-stream. Leave text/event-stream out of the Accept header and the server will reject the request with a 406 — one of the most common “why won't it connect?” mysteries.

Under the hood, every MCP interaction is that same short conversation: negotiate, discover, call, respond. Watch it run:

initializeClient + server agree on protocol version and capabilities
tools/listClient asks what tools exist; server returns their JSON schemas
tools/callClient invokes a tool with JSON arguments
your handler runsYour code executes — call an API, query a database
resultServer returns content, optionally streamed back over SSE
A Streamable HTTP exchange end to end. initialize → tools/list → tools/call → your handler → result. Everything else is variations on this.

Containerize once, deploy anywhere

All three clouds we're targeting run containers, which means you write the packaging once and reuse it everywhere. Here are the two Dockerfiles — pick the one for your language. Both expose port 8080, which is the default our servers listen on and the default the platforms route to.

# Dockerfile — Python
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY server.py .
ENV HOST=0.0.0.0
EXPOSE 8080
CMD ["python", "server.py"]
# Dockerfile — Node / TypeScript (multi-stage: build, then slim runtime)
FROM node:22-slim AS build
WORKDIR /app
COPY package.json tsconfig.json ./
RUN npm install
COPY server.ts .
RUN npm run build

FROM node:22-slim
WORKDIR /app
COPY package.json ./
RUN npm install --omit=dev
COPY --from=build /app/dist ./dist
EXPOSE 8080
CMD ["node", "dist/server.js"]

Test the image locally before you push it anywhere — docker build -t weather-mcp . && docker run -p 8080:8080 weather-mcp, then re-run the curl from above. If it answers locally in a container, it will answer in the cloud.

Deploy to Google Cloud Run

Cloud Run is the shortest path from source to a public HTTPS URL: one command builds your Dockerfile and deploys it. It scales to zero when idle and supports the streaming responses MCP needs.

# One-time: select your project and enable the APIs
gcloud config set project YOUR_PROJECT_ID
gcloud services enable run.googleapis.com cloudbuild.googleapis.com

# Build from source (uses your Dockerfile) and deploy
gcloud run deploy weather-mcp \
  --source . \
  --region us-central1 \
  --allow-unauthenticated \
  --port 8080

# → Service URL: https://weather-mcp-XXXXXXXX-uc.a.run.app
# → MCP endpoint: https://weather-mcp-XXXXXXXX-uc.a.run.app/mcp

That's it — you have a live MCP server. --allow-unauthenticated makes it publicly reachable so you can test immediately; we'll add application-level auth before this is anything but a demo.

Deploy to Azure Container Apps

Azure Container Apps has the same source-to-URL ergonomics via az containerapp up, which builds your image, provisions an environment and registry if needed, and gives you an external HTTPS ingress. It also scales to zero.

# One-time: install the extension and make a resource group
az extension add --name containerapp --upgrade
az group create --name mcp-rg --location eastus

# Build from source and deploy with a public ingress on port 8080
az containerapp up \
  --name weather-mcp \
  --resource-group mcp-rg \
  --location eastus \
  --source . \
  --ingress external \
  --target-port 8080

# → MCP endpoint: https://weather-mcp.<region>.azurecontainerapps.io/mcp

Deploy to AWS with Amazon ECS Express Mode

Not App Runner — it's closing to new customers

AWS App Runner stops accepting new customers on April 30, 2026 (existing accounts keep it, but it gets no new features). Its successor, launched at re:Invent 2025, is Amazon ECS Express Mode: the same one-command simplicity, running on Fargate under the hood, with the full ECS feature set behind it. New projects should start here.

ECS Express Mode takes a container image and provisions the load balancer, target group, security groups, HTTPS URL, and CPU-based autoscaling for you — one command, no infrastructure to hand-wire. It pulls from ECR and needs two IAM roles (a task execution role and an infrastructure role), so the flow is: push the image, create the roles once, then create the service. Every command is included.

REGION=us-east-1
ACCOUNT=$(aws sts get-caller-identity --query Account --output text)
REPO=$ACCOUNT.dkr.ecr.$REGION.amazonaws.com/weather-mcp

# 1) Push the image to ECR (a default VPC with public subnets must exist)
aws ecr create-repository --repository-name weather-mcp --region $REGION
aws ecr get-login-password --region $REGION \
  | docker login --username AWS --password-stdin $ACCOUNT.dkr.ecr.$REGION.amazonaws.com
docker build -t weather-mcp .
docker tag weather-mcp:latest $REPO:latest
docker push $REPO:latest

# 2) One-time: the two roles Express Mode needs (task execution + infrastructure)
aws iam create-role --role-name ecsTaskExecutionRole \
  --assume-role-policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"Service":"ecs-tasks.amazonaws.com"},"Action":"sts:AssumeRole"}]}'
aws iam attach-role-policy --role-name ecsTaskExecutionRole \
  --policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy

aws iam create-role --role-name ecsInfrastructureRoleForExpressServices \
  --assume-role-policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"Service":"ecs.amazonaws.com"},"Action":"sts:AssumeRole"}]}'
aws iam attach-role-policy --role-name ecsInfrastructureRoleForExpressServices \
  --policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSInfrastructureRoleforExpressGatewayServices

# 3) Deploy — Express Mode builds the ALB, autoscaling, and HTTPS URL for you
aws ecs create-express-gateway-service \
  --service-name weather-mcp \
  --primary-container '{"image":"'$REPO':latest","containerPort":8080}' \
  --execution-role-arn arn:aws:iam::$ACCOUNT:role/ecsTaskExecutionRole \
  --infrastructure-role-arn arn:aws:iam::$ACCOUNT:role/ecsInfrastructureRoleForExpressServices \
  --health-check-path /health \
  --monitor-resources

# → MCP endpoint: https://weather-mcp.ecs.us-east-1.on.aws/mcp
#   (the exact URL prints in the command output once status is ACTIVE)
It runs on Fargate — no scale-to-zero

Express Mode keeps at least one Fargate task running and autoscales up on CPU, so there's no cold start but no idle-to-zero either. If scale-to-zero matters on AWS, wrap the same container with the Lambda Web Adapter behind a Function URL instead. One gotcha: IAM roles are eventually consistent, so if the first create call fails with an assume-role error, wait a minute and retry.

Three clouds, one container. Here's how the serverless container targets actually compare on the things that matter for an MCP server — tap between them:

gcloud run deploy --source .
Scale to zeroYes
Streaming / SSEYes
Cold startSeconds
AuthIAM or app-level
Best forSource → HTTPS URL in one command
The same image runs on all three. They differ mostly on scale-to-zero, cold starts, and how you bolt on auth.

Make it production-safe

Statelessness is not optional

This is the mistake that turns a working demo into a 3 a.m. incident. If your server keeps session state in one instance's memory, it works perfectly with one instance — and breaks the moment the autoscaler adds a second, because requests routed to the new box can't find the session. Building a fresh server and transport per request (what our code does) sidesteps the whole problem. Toggle the two modes:

clientload balancer
instance 1
✓ handles it
instance 2
✓ handles it
instance 3
✓ handles it

Stateless: any instance can serve any request. The autoscaler adds boxes and it just works.

Stateful pins each session to one instance, so a load balancer scatters requests into failures. Stateless lets any instance serve anything — which is the only thing that scales.

Lock the door

The moment your server has a public URL, it's on the internet and anyone can call your tools. The minimum bar is a bearer token; the production bar is OAuth per the MCP authorization spec. Here's the minimum, as Express middleware — put it in front of the /mcp route and set the token as a platform secret, never in code:

// Set MCP_TOKEN via the platform's secret store (not in the image).
app.use("/mcp", (req, res, next) => {
  if (req.headers.authorization !== `Bearer ${process.env.MCP_TOKEN}`) {
    return res.status(401).json({ error: "unauthorized" });
  }
  next();
});

Auth is where MCP servers most often go wrong, and it's exactly the “ability to act” edge of the risk model — a public endpoint that runs tools with your credentials is a confused-deputy waiting to happen. For the full layered treatment (input quarantine, tool allowlists, output checks) see Securing Agentic AI: A Layered Defense. Beyond auth, three more make the difference in the cloud: pull every secret from the platform's secret store, set CORS deliberately if a browser client will connect, and raise the request timeout so a long-running tool call isn't cut off mid-stream.

Connect it to an agent

A deployed MCP server is only useful once something talks to it. Any MCP-capable client — Claude Desktop, an IDE, a custom agent, or AgentSwarms — connects the same way: point it at your https://…/mcp URL with the Streamable HTTP transport and, if you added one, the bearer token. From that moment your two tools show up in the client's tool list and the agent can call them like any other capability. That's the whole payoff of doing the deployment properly: the tools you wrote once are now reachable by every agent you build.

The pre-flight checklist

Before you call a custom MCP server “deployed,” walk this list. Every item maps to a failure we've watched teams hit on their first cloud MCP server.

0% — a server missing any of these will bite you the first time a real agent connects.

Tap each item as you confirm it. A server missing any one of these will bite you the first time a real agent connects.

Where AgentSwarms fits

Once your server is live, AgentSwarms is where you put it to work — and where you learn the surrounding craft if any of the above was new.

  • Connect the server you just deployed — the MCP integrations surface lets you point AgentSwarms at your https://…/mcp URL and expose its tools to any agent in your workspace.
  • Give an agent the tools — in the Agent Builder, attach your MCP server so the agent can call get_forecast (or your real tools) as part of its reasoning.
  • Learn the frameworks by building — the interactive notebooks walk through MCP, tool design, and agent wiring on real runnable code, so the concepts this guide assumes become muscle memory.
  • Prototype the whole workflow — the templates library and visual swarm canvas let you drop your new tools into a multi-agent workflow and watch them run, without wiring plumbing by hand.

The takeaway is smaller than it looks: a custom MCP server is just a small web service with a specific shape, and the cloud already knows how to run small web services. Pick Streamable HTTP, keep it stateless, put a token on the door, and the same container you built on your laptop runs on Cloud Run, Container Apps, or ECS Express Mode unchanged. Write the tool once; let every agent you build reach it.

Ship one this week

Start with the smallest useful tool you actually want an agent to have, deploy it to whichever cloud you already use, and connect it in the MCP integrations surface. A live, called-in-anger server teaches you more than any amount of reading — including this post.


Was this useful?

Comments

Sign in to join the discussion.

Loading comments…