Full curriculum

Beginner to production, one curriculum.

Eight tracks. 50+ in-depth lessons. 50+ runnable real-world agents & swarms. Every concept is paired with a live demo you can fork in one click. All free.

8: learning tracks
50+: in-depth lessons
97: runnable notebooks
50+: one-click agents & swarms

Learning tracks

A clear path, in order

Start at Track 01 if you're new. Skip ahead if you've shipped with LLMs before — every track stands on its own.

Field manuals · Senior depth

Five field manuals turn this curriculum into a senior-engineer reference.

At the end of Foundations, Engineering Rigor, SQL & BI Agents, Production & Business, and RAG & Frameworks, long-form field manuals go one level below the chapter — tokenization economics, KV-cache math, schema-linking failure modes, EU AI Act obligations, Reciprocal Rank Fusion, embedding lifecycle, framework lock-in — with worked numerical examples and primary-source citations. If you only have time for one pass, the manuals are the difference between knowing the words and knowing the system.

Track 01
Beginner
~3 hours
Foundations of Generative & Agentic AI
Start here if you've never built with LLMs. Every concept is explained twice: once like you're 10, once for the engineer in the room.
What's inside
- What is a model? (LLM families, base vs instruct, open vs closed)
- Tokens, context windows, and why they cost money
- Prompts, system messages, and few-shot patterns
- Embeddings, vector search, and the retrieval mindset
- What makes something an "agent" vs a chatbot
- Glossary of every term you'll see in the wild
Live templates included
- First Prompt Lab
- Token Counter Demo
Track 02
Beginner → Intermediate
~4 hours
Patterns, Tools & Guardrails
The seven canonical agentic patterns — wired up live. Tool use, RAG, planner-executor, reflection, routing, parallel fan-out, and HITL approvals.
What's inside
- Tool / function calling — the OpenAI schema in plain English
- Retrieval-Augmented Generation (RAG) with citations
- Modern RAG variants: hybrid search, contextual retrieval, agentic & multi-modal
- Graph RAG — entities, relations, multi-hop reasoning (Microsoft GraphRAG style)
- Planner → Executor pattern
- Reflection & self-critique loops
- Routing & classifier-as-controller
- Parallel fan-out / map-reduce agents
- Human-in-the-Loop approvals for risky actions
- Input/output guardrails: PII, prompt injection, schema validation
Live templates included
- Product Support Bot (RAG)
- Graph RAG Researcher swarm (Acme Corp demo KB)
- Code Reviewer (Tools + Guardrails)
- Approval Inbox demo
- Planner-Executor sandbox
Track 03
Beginner → Advanced
~2.5 hours
Agent Memory: Short-Term & Long-Term
Why a chatbot forgets you and an agent doesn't. The two memory tiers (STM and LTM), how recall actually works under the hood, and how to configure both per-agent and per-swarm-node — live, on the platform.
What's inside
- The mental model: scratchpad (STM) vs notebook (LTM)
- Sliding window + rolling summary — how STM survives long chats
- Long-term memory items: facts, preferences, episodic notes, instructions
- Recall: keyword overlap + score + recency (and where embeddings fit later)
- Auto-extraction after every turn — what gets saved, what gets skipped
- Memory tools the agent can call: remember, recall, forget, set, get
- Swarm scope: share memory with the agent, isolate to one run, or none
- PII safety, capacity caps, and pruning low-value items
Live templates included
- Personal Assistant with LTM
- Long-Conversation Tutor (STM summarizer)
- Swarm with shared scratchpad
Track 04
Intermediate → Advanced
~3.5 hours
Engineering Rigor & Deep Mental Models
The senior-engineer view of agents: state, planning, multi-agent protocols, control topology, deterministic vs emergent design, failure handling, eval at scale, and system design under latency/cost constraints. With diagrams, code, and citations to the canonical papers.
What's inside
- The four axes: state, planning, communication, control topology
- Deterministic workflows vs emergent agentic loops (Anthropic's line)
- Failure handling: timeouts, jittered retries, circuit breakers, sagas
- Idempotency keys + structured-output validation with repair loop
- Loop detection and step / token / cost ceilings
- The 4-layer eval pyramid: unit → golden → trajectory → online
- System design under constraints: latency budgets, model cascading, caching, parallel tools
- Centralised vs hierarchical vs peer-to-peer vs market topologies
Live templates included
- Failure-handling diagrams
- Eval flywheel reference
- τ-bench / AgentBench links
Track 05
Intermediate
~3 hours
Text-to-SQL & Data Agents
Turn natural language into safe SQL. AST validation, table allow-listing, schema-aware prompting, and the realities of running this in production (Uber QueryGPT-style).
What's inside
- Why text-to-SQL is harder than it looks
- Schema introspection and few-shot grounding
- AST parsing and validating generated SQL before execution
- Read-only enforcement and table allow-lists
- Cost & row-limit guardrails
- Case study: Uber QueryGPT, Snowflake Cortex Analyst
- Hands-on: query the SaaS sales lakehouse with English
Live templates included
- SQL Analyst Agent
- RevOps Multi-Agent Swarm
Track 06
Intermediate → Advanced
~4 hours
Multi-Agent Swarms
When one agent isn't enough. Build researcher → writer → reviewer pipelines, peer-to-peer collaboration, A2A handoffs, and shared memory across the swarm.
What's inside
- Orchestrator vs peer-to-peer architectures
- Handoff messages, shared scratchpads, and turn limits
- A2A (Agent-to-Agent) protocol basics
- When to split a single agent into a swarm
- Per-node memory scoping (agent / swarm / none)
- Cost & loop-detection guardrails for swarms
- Visual swarm canvas — drag, wire, run
Live templates included
- Research → Writer → Reviewer swarm
- Customer Support Triage swarm
- RevOps SQL Analytics swarm
Track 07
Advanced
~3 hours
Scaling, Observability & Enterprise
Production reality: traces, evals, ROI math, security, OpenAI-compatible gateways, multi-provider strategy. Real case studies from Klarna, Uber, Salesforce.
What's inside
- Reading execution traces and debugging cost spikes
- Production eval ops: regression gates in CI, drift alarms, weekly failure review
- Token, latency & cost dashboards
- AI security: prompt injection, data exfiltration, PII
- OpenAI-compatible gateways and multi-provider routing
- ROI formulas and enterprise cost scenarios
- Maturity model: from prototype to org-wide platform
- Case studies: Klarna, Uber, Salesforce Agentforce, BMW
Live templates included
- Trace Inspector
- Budget Caps demo
- Multi-Provider Gateway
Track 08
Advanced → Expert
~4 hours
Deep Dives — the production gaps most curriculums skip
Six hard-won lessons from real production failures: orchestration architecture, deterministic skeletons vs probabilistic workers, MCP security (Confused Deputy + Tool Description Hijacking), Actor-Model swarms with durable state, heterogeneous routing economics, and voice agents (the STT→LLM→TTS loop and cloud reference architectures). Assumes you finished Tracks 01–07.
What's inside
- Hub-and-Spoke beats both monolithic master agents AND peer-to-peer mesh
- Thin Agent pattern: deterministic state machine + ephemeral <150-line workers
- MCP attack surface: Tool Description Hijacking, Confused Deputy, Shadow AI infra
- Actor Model runtimes: thousands of I/O-bound agents per host with durable checkpoints
- Heterogeneous routing: SLM routers + frontier-LLM escalation = positive ROI
- Voice agents: latency budgets, VAD & barge-in, and AWS/GCP/Azure reference stacks
- Mapping AgentSwarms to the Levels-of-Autonomy framework (L1 → L5)
Live templates included
- Frameworks deep dive (CrewAI / LangGraph / AutoGen)
- Stack examples by industry

97 runnable notebooks

A full notebook lab, in your browser — no setup

Every notebook is a real Jupyter-style workspace with editable cells, live execution, and concept-by-concept explanations. Twelve curated tracks covering the frameworks and patterns the industry actually ships.

Foundations & Prompting

2 notebooks

Prompt engineering, multimodal & vision, structured JSON — built on raw fetch.

Failure Modes Lab

1 notebook

12 real production failures — hallucinations, injection, RAG poisoning, cost blow-ups — with fixes.

Agentic Evaluation

8 notebooks

Deterministic vs semantic, LLM-as-judge, juries, judge calibration, RAG triad, trajectory, red-team, operational metrics.

LangChain & LangGraph

5 notebooks

Chat, tools, agents, memory + human-in-the-loop, embeddings & RAG.

LlamaIndex.ts

7 notebooks

Nodes, vector index, sentence-window, router, sub-question, data agents, faithfulness eval.

OpenAI Agents SDK

6 notebooks

Agents & the run loop, handoffs & triage, guardrail tripwires, structured output, sessions, streaming & tracing.

Vercel AI SDK

6 notebooks

generateText/streamText, typed objects with Zod, tool() + multi-step loops, the Agent class.

Google ADK (TypeScript)

6 notebooks

LlmAgent, FunctionTool, Sequential/Parallel/Loop, multi-agent transfer, callbacks & safety.

Standalone Agents

10 notebooks

Hand-rolled agent patterns end to end — no framework, every step visible.

Multi-Agent Systems (LangGraph.js)

6 notebooks

State machines, supervisors, parallel workers, and durable multi-agent graphs.

Enterprise Ops & Safety

3 notebooks

Cost routing, semantic chunking, and building an MCP server.

Real-world Examples

6 notebooks

Applied end-to-end builds that combine retrieval, tools, and evaluation.

Real packages · real model calls · editable cells · no install

One-click runnables

Don't just read about multi-agent systems — run real ones

Every track ships with production-shaped agents and swarms you can launch in a single click — wired with knowledge bases, SQL tools, RAG retrievers, guardrails, approval gates and observable traces. Open the canvas, fire the suggested prompt, watch each node light up in real time, then fork it and break it.

Live canvas

See every handoff, tool call & token as it streams

Real KBs

Pre-seeded help-center, research & ERP corpora

HITL approvals

Risky actions pause for a human, just like prod

Fork & remix

Edit the system prompt, swap models, re-run

Customer Support

2 swarms · 3 agents

Multi-agent swarms

Customer Support Triage
Classifier → Responder → QA reviewer with human approval
LLM-as-a-Judge — Support QA
Tone + Policy + Technical sub-judges → Chief Magistrate scorecard

Standalone agents

Product Support Assistant
Grounded RAG over a real help-center KB, with citations & refusals
Support Ticket Triage Agent
Auto-tags category + priority for incoming tickets
SLA Breach Detector
Calculates ticket age against SLA and flags every breach

Sales & RevOps

2 swarms · 2 agents

Multi-agent swarms

Sales Lead Enrichment
Intake → Enricher → Scorer → Email drafter (approval-gated)
SaaS RevOps — Multi-Agent SQL Analyst
VP prompt → SQL Planner → Local DB → RevOps Analyst → Strategic Synthesizer → Deal Desk

Standalone agents

Sales Outreach Drafter
Personalized outbound drafts — humans approve before sending
CRM Data Cleanser
Coerces messy text into strict JSON for HubSpot / Salesforce ingestion

Research & Knowledge

3 swarms · 3 agents

Multi-agent swarms

Research → Report Writer
Planner → Researcher → Synthesizer → Editor pipeline
Graph RAG Researcher
Graph search → Document RAG → Synthesizer for multi-hop questions
gRED-style Drug Discovery Research
Planner → Literature + Assay + Internal-data → Reconciliation → Hypothesis memo

Standalone agents

Research Q&A on AI Papers
Cited answers over a curated library of foundational AI papers
Long-Context Document Analyst
Claude-powered summarizer for contracts & long transcripts
Graph RAG Explorer (Acme Corp)
Multi-hop questions over a pre-built entity-relation knowledge graph

Engineering & Quality

3 swarms · 4 agents

Multi-agent swarms

Code Review Pipeline
Static summarizer → Security + Style reviewers → Merged comment
RAG Evaluation Harness — LLM as a Judge
Two candidates → GPT-5 judge → structured rubric scorecard
Content Moderation QA with Evaluate Node
Toxicity + Misinfo + Policy → Moderator → quality-gate Evaluate node

Standalone agents

Code Review Assistant
Reviews a pasted diff for bugs, style and security issues
Python Traceback Fixer
Reads a Python error log and writes the exact fix
Regex Generator
Translates plain English into ready-to-use regular expressions
SQL Dialect Translator
Converts queries between MySQL, Postgres, SQL Server, and SQLite

Failure-pattern labs (debugging)

3 swarms

Multi-agent swarms

Infinite Tool Loop
Watch an agent burn tokens calling the same tool forever — then see the fix
JSON Wrapper Crash
An LLM adds "Sure! Here's the JSON:" — the next agent throws SyntaxError
Context Window Collapse
Three workers dump 40 pages into one Synthesizer — watch it forget the question

Financial Services

4 swarms

Multi-agent swarms

Earnings Call Analyst Desk
Numbers + Tone + Risk → Compliance check → Analyst memo
Stock Investment — CIO Swarm
Fundamental + News + Quant + Risk → CIO investment memo
Responsible AI Guardrails (Banking)
PII Redactor → Safety Classifier → Guardrailed Responder ⊕ Refusal → Audit Log
Financial Variance — ERP + RAG
Orchestrator → ERP Data + RAG Doc Agent → FP&A Synthesizer

Healthcare & Life Sciences

2 swarms

Multi-agent swarms

Clinical Intake & Prior-Authorization
Symptom intake → Triage → Differential → Coding → Prior-auth → Clinician approval
Agentic RAG — Drug Safety Investigation
Router → parallel KB + Graph + SQL retrievers → Self-Eval loop → Synthesizer

Legal, HR & Operations

3 swarms · 1 agent

Multi-agent swarms

Contract Redline & Risk Review
Clause splitter → parallel risk / definitions / jurisdiction reviewers → Partner approval
Frontline Hiring Orchestrator
Manager → Sourcing + Screener + Scheduler + Onboarding → Recruiter approval
Disaster Response — Crisis Triage at Scale
Intake classifier → Severity scorer → Resource matcher → Field-team router

Standalone agents

Meeting Action-Item Extractor
Pulls assigned tasks and deadlines out of long meeting transcripts

Industry verticals

5 swarms

Multi-agent swarms

Auto-Claims FNOL Triage (Insurance)
Intake → Coverage Check (RAG) → Fraud Signals (SQL) → Reserve & Routing
Manufacturing — Quality NCR Root-Cause
Defect intake → Spec lookup → History query → 5-Whys → CAPA draft
SOC Alert Triage (Cybersecurity)
Alert intake → ATT&CK enrichment → SQL correlation → Containment proposal
Retail Returns & Reverse-Logistics
Return intake → Policy RAG → Customer-history SQL → Disposition decision
Adaptive Socratic Tutor + Auto-Grader (Education)
Diagnose misconception → Curriculum RAG → Socratic hint → Grader

Marketing, Web & Content

2 swarms · 7 agents

Multi-agent swarms

Autonomous Localization & Compliance
Creative Director → parallel Copywriter + Designer → Compliance loop
Autonomous Ad Campaign Engine
Brief + reference photo → parallel copy & image gen → vision QA → approved ad

Standalone agents

Landing Page Roaster
Critiques marketing copy against direct-response copywriting frameworks
SEO Meta-Tag Generator
SEO titles + meta descriptions within strict character limits
Brand Voice Translator
Rewrites copy in any persona, from Gen-Z to Victorian novelist
Firecrawl Web Summarizer
Scrapes a live URL and answers a precise question about its content
Competitor Feature Tracker
Scrapes a changelog and surfaces the updates you care about
GitHub Repo Explainer
Reads a public README and explains it for a non-technical audience
Invoice Parser
Extracts vendor, totals, tax and due dates from pasted invoice text

Why this matters for learning: Agentic AI is a systems discipline — handoffs, loops, retries, guardrails, traces. You can't internalize that from prose. Every swarm above is the lesson made tangible: the moment you watch the QA reviewer reject the responder's draft and trigger a retry, or see the Evaluate node fail-close on a toxic generation, the theory clicks. Each runnable is the laboratory for the chapter that introduced it.

Voice agents

Build agents you can talk to

Give any agent a voice. You speak, it transcribes your words, thinks with its full brain — tools, knowledge, memory and guardrails — and answers out loud. AgentSwarms wires the whole speech-to-speech loop, so a voice agent runs the same engine as a chat agent. Everything you learn in the tracks above applies unchanged.

You speak

Mic captured in the browser

Speech-to-text

Transcribed to a message

The agent thinks

Tools · RAG · memory · guardrails

Spoken reply

Answered in a natural voice

Try it in one click: the Voice Playground ships ready-to-talk sample agents — a support-triage agent, a B2B discovery caller and a Spanish tutor. Talk to one, then fork it. Or flip a single switch (New Voice Agent) to turn an agent you built into one you can speak with.

Spotlight · Agent Memory

Why a chatbot forgets you — and an agent doesn't

Memory is the single biggest leap from "stateless chatbot" to "real assistant." Here's the same concept explained two ways — once for beginners, once for engineers — with everything you can actually configure on AgentSwarms today.

For beginners

Think of it like a person at a desk

An LLM has no memory by itself — every request starts from scratch. AgentSwarms gives every agent two memory aids:

Short-Term Memory (STM) — the scratchpad
Whatever was said recently in this chat. When the chat gets too long, older parts get squeezed into a one-paragraph summary so the agent never "forgets" the earlier topic.
Long-Term Memory (LTM) — the notebook
Durable notes the agent jots down across every conversation — your name, what you're working on, your preferences. Next week, in a brand-new chat, it still remembers.

In one sentence: STM is "what we just talked about," LTM is "what I know about you."

For engineers

How it actually works under the hood

STM = sliding window + rolling summary
Last N messages (default 20) are sent verbatim. Anything older is folded into a running summary stored on conversation_memory.summaryand prepended as a system block on every turn.
LTM = typed memory items + scored recall
Items are fact / preference / episodic / instruction. Recall ranks by keyword overlap (GIN index) + stored score + recency, top-K injected as a "What you remember about this user" block. Pluggable to embeddings later.
Auto-extraction with PII filter
After each assistant turn, a structured-output pass proposes durable items. Anything matching redaction placeholders ([EMAIL],[PHONE], …) is dropped before storage.
Agent-callable tools
memory_remember, memory_recall, memory_forget, plus memory_set / memory_get for a per-conversation JSON scratchpad shared across swarm nodes.

Per-agent

Memory tab in Agent Builder

Toggle STM/LTM, set the window size, top-K recall, max items, and inspect or delete remembered facts — all without code.

Per-swarm-node

Three LTM scopes

agent shares with the agent's normal sessions. swarm isolates to one run (uses swarm_run_id as the key). none turns LTM off entirely for that node.

Observable

Recall chip in chat

Every assistant message shows how many LTM items were recalled (and a preview), so you can see exactly what the agent "remembered" — no guessing.

Track 08 · Deep Dives

The six production gaps most curriculums skip

Orchestration architecture, deterministic skeletons, MCP security, Actor-Model swarms, heterogeneous routing economics, and voice agents (the STT→LLM→TTS loop and cloud reference architectures). Full lessons — including the L1→L5 autonomy mapping — live in the Deep Dives chapter inside the lessons.

After you finish

Then comes the real fun: shipping it to production

Finishing the curriculum gets you to a working agent. The next 12 months are about turning that into a system real users depend on. We mapped the whole journey — for builders and for the leaders who fund them.

Pick a real pilot

Narrow scope, measurable success, low blast-radius.

Build evals first

50+ case golden set wired into CI before you scale.

Harden it

OWASP LLM Top 10, guardrails, HITL on dangerous tools.

Observe everything

Traces, cost, drift, weekly failure review.

Choose where it runs

Bedrock, Azure AI Foundry, Vertex, AgentKit, edge — pick on data + skills, not hype.

Operate it

On-call, change management, model deprecations every 6–12 months.

Scale across the org

Platform team, FinOps chargeback, AI policy, regulatory mapping.

★

Persona checklists

30 / 90 / 365-day plans for both Builders and Leaders.

What's coming

The curriculum keeps growing

We're adding new tracks every few weeks. Here's what's on deck — vote with your feedback on the contact page.

Vector recall for memory

Upgrade Long-Term Memory from keyword overlap to embeddings — semantic recall, hybrid search, and when each one wins.

soon

Build-along projects

Multi-day guided projects: ship a customer-support agent, an internal data analyst, and a full RevOps swarm — end-to-end with case-study writeups.

soon

Red-team playbook for agents

Hands-on adversarial labs: jailbreaks, tool-chain hijacks, indirect prompt injection via RAG, and how to write the eval that catches each one.

soon

FinOps for agentic systems

Per-tenant chargeback, model cascading economics, prompt caching ROI, and budget caps that actually hold under load.

Beginner to production, one curriculum.

A clear path, in order

Five field manuals turn this curriculum into a senior-engineer reference.

Foundations of Generative & Agentic AI

Patterns, Tools & Guardrails

Agent Memory: Short-Term & Long-Term

Engineering Rigor & Deep Mental Models

Text-to-SQL & Data Agents

Multi-Agent Swarms

Scaling, Observability & Enterprise

Deep Dives — the production gaps most curriculums skip

A full notebook lab, in your browser — no setup

Foundations & Prompting

Failure Modes Lab

Agentic Evaluation

LangChain & LangGraph

LlamaIndex.ts

OpenAI Agents SDK

Vercel AI SDK

Google ADK (TypeScript)

Standalone Agents

Multi-Agent Systems (LangGraph.js)

Enterprise Ops & Safety

Real-world Examples

Don't just read about multi-agent systems — run real ones

Customer Support

Sales & RevOps

Research & Knowledge

Engineering & Quality

Failure-pattern labs (debugging)

Financial Services

Healthcare & Life Sciences

Legal, HR & Operations

Industry verticals

Marketing, Web & Content

Build agents you can talk to

Why a chatbot forgets you — and an agent doesn't

Think of it like a person at a desk

How it actually works under the hood

Memory tab in Agent Builder

Three LTM scopes

Recall chip in chat

The six production gaps most curriculums skip

Then comes the real fun: shipping it to production

Pick a real pilot

Build evals first

Harden it

Observe everything

Choose where it runs

Operate it

Scale across the org

Persona checklists

The curriculum keeps growing

Vector recall for memory

Build-along projects

Red-team playbook for agents

FinOps for agentic systems

Ready to start Track 01?