Full curriculum
Beginner to production — one curriculum.
Eight tracks. 50+ in-depth lessons. 50+ runnable real-world agents & swarms. Every concept is paired with a live demo you can fork in one click. All free.
- 8
- learning tracks
- 50+
- in-depth lessons
- 50+
- one-click agents & swarms
- 12+
- real-world case studies
Learning tracks
A clear path, in order
Start at Track 01 if you're new. Skip ahead if you've shipped with LLMs before — every track stands on its own.
Field manuals · Senior depth
Five field manuals turn this curriculum into a senior-engineer reference.
At the end of Foundations, Engineering Rigor, SQL & BI Agents, Production & Business, and RAG & Frameworks, long-form field manuals go one level below the chapter — tokenization economics, KV-cache math, schema-linking failure modes, EU AI Act obligations, Reciprocal Rank Fusion, embedding lifecycle, framework lock-in — with worked numerical examples and primary-source citations. If you only have time for one pass, the manuals are the difference between knowing the words and knowing the system.
- Track 01
Beginner
~3 hours
Foundations of Generative & Agentic AI
Start here if you've never built with LLMs. Every concept is explained twice: once like you're 10, once for the engineer in the room.
What's inside
- What is a model? (LLM families, base vs instruct, open vs closed)
- Tokens, context windows, and why they cost money
- Prompts, system messages, and few-shot patterns
- Embeddings, vector search, and the retrieval mindset
- What makes something an "agent" vs a chatbot
- Glossary of every term you'll see in the wild
Live templates included
- First Prompt Lab
- Token Counter Demo
- Track 02
Beginner → Intermediate
~4 hours
Patterns, Tools & Guardrails
The seven canonical agentic patterns — wired up live. Tool use, RAG, planner-executor, reflection, routing, parallel fan-out, and HITL approvals.
What's inside
- Tool / function calling — the OpenAI schema in plain English
- Retrieval-Augmented Generation (RAG) with citations
- Modern RAG variants: hybrid search, contextual retrieval, agentic & multi-modal
- Graph RAG — entities, relations, multi-hop reasoning (Microsoft GraphRAG style)
- Planner → Executor pattern
- Reflection & self-critique loops
- Routing & classifier-as-controller
- Parallel fan-out / map-reduce agents
- Human-in-the-Loop approvals for risky actions
- Input/output guardrails: PII, prompt injection, schema validation
Live templates included
- Product Support Bot (RAG)
- Graph RAG Researcher swarm (Acme Corp demo KB)
- Code Reviewer (Tools + Guardrails)
- Approval Inbox demo
- Planner-Executor sandbox
- Track 03
Beginner → Advanced
~2.5 hours
Agent Memory: Short-Term & Long-Term
Why a chatbot forgets you and an agent doesn't. The two memory tiers (STM and LTM), how recall actually works under the hood, and how to configure both per-agent and per-swarm-node — live, on the platform.
What's inside
- The mental model: scratchpad (STM) vs notebook (LTM)
- Sliding window + rolling summary — how STM survives long chats
- Long-term memory items: facts, preferences, episodic notes, instructions
- Recall: keyword overlap + score + recency (and where embeddings fit later)
- Auto-extraction after every turn — what gets saved, what gets skipped
- Memory tools the agent can call: remember, recall, forget, set, get
- Swarm scope: share memory with the agent, isolate to one run, or none
- PII safety, capacity caps, and pruning low-value items
Live templates included
- Personal Assistant with LTM
- Long-Conversation Tutor (STM summarizer)
- Swarm with shared scratchpad
- Track 04
Intermediate → Advanced
~3.5 hours
Engineering Rigor & Deep Mental Models
The senior-engineer view of agents: state, planning, multi-agent protocols, control topology, deterministic vs emergent design, failure handling, eval at scale, and system design under latency/cost constraints. With diagrams, code, and citations to the canonical papers.
What's inside
- The four axes: state, planning, communication, control topology
- Deterministic workflows vs emergent agentic loops (Anthropic's line)
- Failure handling: timeouts, jittered retries, circuit breakers, sagas
- Idempotency keys + structured-output validation with repair loop
- Loop detection and step / token / cost ceilings
- The 4-layer eval pyramid: unit → golden → trajectory → online
- System design under constraints: latency budgets, model cascading, caching, parallel tools
- Centralised vs hierarchical vs peer-to-peer vs market topologies
Live templates included
- Failure-handling diagrams
- Eval flywheel reference
- τ-bench / AgentBench links
- Track 05
Intermediate
~3 hours
Text-to-SQL & Data Agents
Turn natural language into safe SQL. AST validation, table allow-listing, schema-aware prompting, and the realities of running this in production (Uber QueryGPT-style).
What's inside
- Why text-to-SQL is harder than it looks
- Schema introspection and few-shot grounding
- AST parsing and validating generated SQL before execution
- Read-only enforcement and table allow-lists
- Cost & row-limit guardrails
- Case study: Uber QueryGPT, Snowflake Cortex Analyst
- Hands-on: query the SaaS sales lakehouse with English
Live templates included
- SQL Analyst Agent
- RevOps Multi-Agent Swarm
- Track 06
Intermediate → Advanced
~4 hours
Multi-Agent Swarms
When one agent isn't enough. Build researcher → writer → reviewer pipelines, peer-to-peer collaboration, A2A handoffs, and shared memory across the swarm.
What's inside
- Orchestrator vs peer-to-peer architectures
- Handoff messages, shared scratchpads, and turn limits
- A2A (Agent-to-Agent) protocol basics
- When to split a single agent into a swarm
- Per-node memory scoping (agent / swarm / none)
- Cost & loop-detection guardrails for swarms
- Visual swarm canvas — drag, wire, run
Live templates included
- Research → Writer → Reviewer swarm
- Customer Support Triage swarm
- RevOps SQL Analytics swarm
- Track 07
Advanced
~3 hours
Scaling, Observability & Enterprise
Production reality: traces, evals, ROI math, security, OpenAI-compatible gateways, multi-provider strategy. Real case studies from Klarna, Uber, Salesforce.
What's inside
- Reading execution traces and debugging cost spikes
- Production eval ops: regression gates in CI, drift alarms, weekly failure review
- Token, latency & cost dashboards
- AI security: prompt injection, data exfiltration, PII
- OpenAI-compatible gateways and multi-provider routing
- ROI formulas and enterprise cost scenarios
- Maturity model: from prototype to org-wide platform
- Case studies: Klarna, Uber, Salesforce Agentforce, BMW
Live templates included
- Trace Inspector
- Budget Caps demo
- Multi-Provider Gateway
- Track 08
Advanced → Expert
~4 hours
Deep Dives — the production gaps most curriculums skip
Five hard-won lessons from real production failures: orchestration architecture, deterministic skeletons vs probabilistic workers, MCP security (Confused Deputy + Tool Description Hijacking), Actor-Model swarms with durable state, and heterogeneous routing economics. Assumes you finished Tracks 01–07.
What's inside
- Hub-and-Spoke beats both monolithic master agents AND peer-to-peer mesh
- Thin Agent pattern: deterministic state machine + ephemeral <150-line workers
- MCP attack surface: Tool Description Hijacking, Confused Deputy, Shadow AI infra
- Actor Model runtimes: thousands of I/O-bound agents per host with durable checkpoints
- Heterogeneous routing: SLM routers + frontier-LLM escalation = positive ROI
- Mapping AgentSwarms to the Levels-of-Autonomy framework (L1 → L5)
Live templates included
- Frameworks deep dive (CrewAI / LangGraph / AutoGen)
- Stack examples by industry
One-click runnables
Don't just read about multi-agent systems — run real ones
Every track ships with production-shaped agents and swarms you can launch in a single click — wired with knowledge bases, SQL tools, RAG retrievers, guardrails, approval gates and observable traces. Open the canvas, fire the suggested prompt, watch each node light up in real time, then fork it and break it.
Live canvas
See every handoff, tool call & token as it streams
Real KBs
Pre-seeded help-center, research & ERP corpora
HITL approvals
Risky actions pause for a human, just like prod
Fork & remix
Edit the system prompt, swap models, re-run
Customer Support
2 swarms · 3 agentsMulti-agent swarms
Customer Support Triage
Classifier → Responder → QA reviewer with human approval
LLM-as-a-Judge — Support QA
Tone + Policy + Technical sub-judges → Chief Magistrate scorecard
Standalone agents
Product Support Assistant
Grounded RAG over a real help-center KB, with citations & refusals
Support Ticket Triage Agent
Auto-tags category + priority for incoming tickets
SLA Breach Detector
Calculates ticket age against SLA and flags every breach
Sales & RevOps
2 swarms · 2 agentsMulti-agent swarms
Sales Lead Enrichment
Intake → Enricher → Scorer → Email drafter (approval-gated)
SaaS RevOps — Multi-Agent SQL Analyst
VP prompt → SQL Planner → Local DB → RevOps Analyst → Strategic Synthesizer → Deal Desk
Standalone agents
Sales Outreach Drafter
Personalized outbound drafts — humans approve before sending
CRM Data Cleanser
Coerces messy text into strict JSON for HubSpot / Salesforce ingestion
Research & Knowledge
3 swarms · 3 agentsMulti-agent swarms
Research → Report Writer
Planner → Researcher → Synthesizer → Editor pipeline
Graph RAG Researcher
Graph search → Document RAG → Synthesizer for multi-hop questions
gRED-style Drug Discovery Research
Planner → Literature + Assay + Internal-data → Reconciliation → Hypothesis memo
Standalone agents
Research Q&A on AI Papers
Cited answers over a curated library of foundational AI papers
Long-Context Document Analyst
Claude-powered summarizer for contracts & long transcripts
Graph RAG Explorer (Acme Corp)
Multi-hop questions over a pre-built entity-relation knowledge graph
Engineering & Quality
3 swarms · 4 agentsMulti-agent swarms
Code Review Pipeline
Static summarizer → Security + Style reviewers → Merged comment
RAG Evaluation Harness — LLM as a Judge
Two candidates → GPT-5 judge → structured rubric scorecard
Content Moderation QA with Evaluate Node
Toxicity + Misinfo + Policy → Moderator → quality-gate Evaluate node
Standalone agents
Code Review Assistant
Reviews a pasted diff for bugs, style and security issues
Python Traceback Fixer
Reads a Python error log and writes the exact fix
Regex Generator
Translates plain English into ready-to-use regular expressions
SQL Dialect Translator
Converts queries between MySQL, Postgres, SQL Server, and SQLite
Failure-pattern labs (debugging)
3 swarmsMulti-agent swarms
Infinite Tool Loop
Watch an agent burn tokens calling the same tool forever — then see the fix
JSON Wrapper Crash
An LLM adds "Sure! Here's the JSON:" — the next agent throws SyntaxError
Context Window Collapse
Three workers dump 40 pages into one Synthesizer — watch it forget the question
Financial Services
4 swarmsMulti-agent swarms
Earnings Call Analyst Desk
Numbers + Tone + Risk → Compliance check → Analyst memo
Stock Investment — CIO Swarm
Fundamental + News + Quant + Risk → CIO investment memo
Responsible AI Guardrails (Banking)
PII Redactor → Safety Classifier → Guardrailed Responder ⊕ Refusal → Audit Log
Financial Variance — ERP + RAG
Orchestrator → ERP Data + RAG Doc Agent → FP&A Synthesizer
Healthcare & Life Sciences
2 swarmsMulti-agent swarms
Clinical Intake & Prior-Authorization
Symptom intake → Triage → Differential → Coding → Prior-auth → Clinician approval
Agentic RAG — Drug Safety Investigation
Router → parallel KB + Graph + SQL retrievers → Self-Eval loop → Synthesizer
Legal, HR & Operations
3 swarms · 1 agentMulti-agent swarms
Contract Redline & Risk Review
Clause splitter → parallel risk / definitions / jurisdiction reviewers → Partner approval
Frontline Hiring Orchestrator
Manager → Sourcing + Screener + Scheduler + Onboarding → Recruiter approval
Disaster Response — Crisis Triage at Scale
Intake classifier → Severity scorer → Resource matcher → Field-team router
Standalone agents
Meeting Action-Item Extractor
Pulls assigned tasks and deadlines out of long meeting transcripts
Industry verticals
5 swarmsMulti-agent swarms
Auto-Claims FNOL Triage (Insurance)
Intake → Coverage Check (RAG) → Fraud Signals (SQL) → Reserve & Routing
Manufacturing — Quality NCR Root-Cause
Defect intake → Spec lookup → History query → 5-Whys → CAPA draft
SOC Alert Triage (Cybersecurity)
Alert intake → ATT&CK enrichment → SQL correlation → Containment proposal
Retail Returns & Reverse-Logistics
Return intake → Policy RAG → Customer-history SQL → Disposition decision
Adaptive Socratic Tutor + Auto-Grader (Education)
Diagnose misconception → Curriculum RAG → Socratic hint → Grader
Marketing, Web & Content
2 swarms · 7 agentsMulti-agent swarms
Autonomous Localization & Compliance
Creative Director → parallel Copywriter + Designer → Compliance loop
Autonomous Ad Campaign Engine
Brief + reference photo → parallel copy & image gen → vision QA → approved ad
Standalone agents
Landing Page Roaster
Critiques marketing copy against direct-response copywriting frameworks
SEO Meta-Tag Generator
SEO titles + meta descriptions within strict character limits
Brand Voice Translator
Rewrites copy in any persona, from Gen-Z to Victorian novelist
Firecrawl Web Summarizer
Scrapes a live URL and answers a precise question about its content
Competitor Feature Tracker
Scrapes a changelog and surfaces the updates you care about
GitHub Repo Explainer
Reads a public README and explains it for a non-technical audience
Invoice Parser
Extracts vendor, totals, tax and due dates from pasted invoice text
Why this matters for learning: Agentic AI is a systems discipline — handoffs, loops, retries, guardrails, traces. You can't internalize that from prose. Every swarm above is the lesson made tangible: the moment you watch the QA reviewer reject the responder's draft and trigger a retry, or see the Evaluate node fail-close on a toxic generation, the theory clicks. Each runnable is the laboratory for the chapter that introduced it.
Spotlight · Agent Memory
Why a chatbot forgets you — and an agent doesn't
Memory is the single biggest leap from "stateless chatbot" to "real assistant." Here's the same concept explained two ways — once for beginners, once for engineers — with everything you can actually configure on AgentSwarms today.
Think of it like a person at a desk
An LLM has no memory by itself — every request starts from scratch. AgentSwarms gives every agent two memory aids:
Short-Term Memory (STM) — the scratchpad
Whatever was said recently in this chat. When the chat gets too long, older parts get squeezed into a one-paragraph summary so the agent never "forgets" the earlier topic.
Long-Term Memory (LTM) — the notebook
Durable notes the agent jots down across every conversation — your name, what you're working on, your preferences. Next week, in a brand-new chat, it still remembers.
In one sentence: STM is "what we just talked about," LTM is "what I know about you."
How it actually works under the hood
STM = sliding window + rolling summary
Last
Nmessages (default 20) are sent verbatim. Anything older is folded into a running summary stored onconversation_memory.summaryand prepended as a system block on every turn.LTM = typed memory items + scored recall
Items are
fact/preference/episodic/instruction. Recall ranks by keyword overlap (GIN index) + stored score + recency, top-K injected as a "What you remember about this user" block. Pluggable to embeddings later.Auto-extraction with PII filter
After each assistant turn, a structured-output pass proposes durable items. Anything matching redaction placeholders (
[EMAIL],[PHONE], …) is dropped before storage.Agent-callable tools
memory_remember,memory_recall,memory_forget, plusmemory_set/memory_getfor a per-conversation JSON scratchpad shared across swarm nodes.
Per-agent
Memory tab in Agent Builder
Toggle STM/LTM, set the window size, top-K recall, max items, and inspect or delete remembered facts — all without code.
Per-swarm-node
Three LTM scopes
agent shares with the agent's normal sessions. swarm isolates to one run (uses swarm_run_id as the key). none turns LTM off entirely for that node.
Observable
Recall chip in chat
Every assistant message shows how many LTM items were recalled (and a preview), so you can see exactly what the agent "remembered" — no guessing.
After you finish
Then comes the real fun: shipping it to production
Finishing the curriculum gets you to a working agent. The next 12 months are about turning that into a system real users depend on. We mapped the whole journey — for builders and for the leaders who fund them.
Pick a real pilot
Narrow scope, measurable success, low blast-radius.
Build evals first
50+ case golden set wired into CI before you scale.
Harden it
OWASP LLM Top 10, guardrails, HITL on dangerous tools.
Observe everything
Traces, cost, drift, weekly failure review.
Choose where it runs
Bedrock, Azure AI Foundry, Vertex, AgentKit, edge — pick on data + skills, not hype.
Operate it
On-call, change management, model deprecations every 6–12 months.
Scale across the org
Platform team, FinOps chargeback, AI policy, regulatory mapping.
Persona checklists
30 / 90 / 365-day plans for both Builders and Leaders.
What's coming
The curriculum keeps growing
We're adding new tracks every few weeks. Here's what's on deck — vote with your feedback on the contact page.
Vector recall for memory
Upgrade Long-Term Memory from keyword overlap to embeddings — semantic recall, hybrid search, and when each one wins.
Build-along projects
Multi-day guided projects: ship a customer-support agent, an internal data analyst, and a full RevOps swarm — end-to-end with case-study writeups.
Red-team playbook for agents
Hands-on adversarial labs: jailbreaks, tool-chain hijacks, indirect prompt injection via RAG, and how to write the eval that catches each one.
FinOps for agentic systems
Per-tenant chargeback, model cascading economics, prompt caching ROI, and budget caps that actually hold under load.