The AgentSwarms curriculum

Everything you need to build serious agentic AI.

This is the textbook we wish existed when we started. It scales from "what's a token?" to "how do I shadow-eval a multi-tenant swarm in production?" — same page, same vocabulary. Skim it like docs, study it like a course, or jump straight to the lab.

6
Core concepts
6
Worked examples
40+
Use cases
Pick your path

Three ways through this curriculum

Weekend 1

Total Beginner — 'I've used ChatGPT, that's it'

  1. 1Read concept 01 (Prompts) — try changing the system prompt of a template
  2. 2Read concept 02 (RAG) — upload a PDF, ask 5 questions
  3. 3Skim concept 03 (Tools) — run the demo Research agent
  4. 4Stop. You now know more than 90% of people talking about agents.
Week 1-2

Builder — 'I've shipped a chatbot, want to go deeper'

  1. 1All 6 concepts, in order, do every example
  2. 2Fork a template, swap models, compare traces
  3. 3Build your own swarm with 3 agents
  4. 4Add guardrails + an HITL approval gate
  5. 5Write your first 10-case eval suite
Ongoing

Advanced — 'I'm taking agents to production'

  1. 1Compare 3 providers on the same eval set — pick by cost+latency, not vibes
  2. 2Build a multi-tenant RAG with namespaced vector stores
  3. 3Wire OpenTelemetry from your traces into your APM
  4. 4Design a HITL approval flow with <2-min p95 latency
  5. 5Run shadow-mode evals on every prompt change
01
Concept 01

Prompts & System Messages

The system prompt is your agent's constitution. Everything else — tools, RAG, swarms — sits on top of it.

Beginner — the intuition

A prompt is just text you send to the model. The 'system' prompt is a special, sticky instruction that tells the model who it is and how to behave. The 'user' prompt is what the human asks. Models read both as one big conversation. Change the system prompt and the same model will talk like a teacher, a lawyer, or a sarcastic pirate.

Advanced — the gotchas

System prompts are the cheapest, highest-leverage place to encode policies, output schemas, refusal rules, and persona. Treat them like configuration: version them, write evals against them, and never let users override them via prompt-injection. Pair with structured outputs (JSON schema mode) to make the model's contract enforceable, not aspirational. Few-shot exemplars belong in the system prompt only when role-shaping fails — otherwise they bloat tokens and reduce instruction-following.

Worked example — A reusable system-prompt template
You are {{role}}, a helpful assistant for {{audience}}.

# Goals
- {{primary_goal}}
- Always cite sources when using retrieved context.

# Tone
- Friendly, concise, never condescending.

# Refusals
- If asked for medical, legal, or financial advice,
  acknowledge limits and suggest a professional.

# Output format
Respond in markdown. For lists, use "-".
For code, use fenced blocks with the language tag.
In real life
  • A study buddy that always quizzes back with 1 question
  • A cooking assistant that converts units before answering
  • A journaling coach that mirrors your mood
In the enterprise
  • Brand-voice enforcement across 50+ marketing agents
  • Refusal policies for regulated content
  • Locale-aware compliance disclaimers
Common pitfalls
  • Stuffing it with examples instead of rules
  • Letting user input override system instructions
  • Forgetting to version it — drift kills evals
02
Concept 02

RAG & Knowledge Bases

Retrieval-Augmented Generation grounds the model in YOUR documents so answers come with citations instead of guesses.

Beginner — the intuition

LLMs are trained on the public internet. They don't know your company handbook or your textbook. RAG fixes that: we (1) chop your docs into chunks, (2) embed them as vectors, (3) at query time, find the most-similar chunks and (4) paste them into the prompt. The model now answers from real text it can cite — not memory.

Advanced — the gotchas

Chunking is the single biggest lever. Semantic chunking outperforms fixed-size for narrative docs; recursive character splitting wins for code. Re-rank top-k with a cross-encoder before stuffing context — it cuts hallucinations dramatically. For multi-tenant RAG, namespace by tenant in your vector store and ALWAYS filter at query time, not in the prompt. Watch for retrieval failure modes: lost-in-the-middle, query/document mismatch (use HyDE or multi-query), and stale embeddings after model upgrades.

Worked example — Minimal RAG loop (pseudocode)
// 1. Index time
const chunks = chunkDocument(doc, { size: 500, overlap: 50 });
const vectors = await embed(chunks);
await vectorStore.upsert(vectors);

// 2. Query time
const queryVec = await embed([userQuestion]);
const top = await vectorStore.search(queryVec, { k: 8 });
const reranked = await rerank(userQuestion, top); // <- huge quality win

const prompt = `
Answer using ONLY the context below. Cite as [1], [2].
Context:
${reranked.map((c, i) => `[${i+1}] ${c.text}`).join("\n\n")}

Question: ${userQuestion}
`;
return llm.chat(prompt);
In real life
  • Q&A over a textbook you're studying
  • Search across all your saved Pocket articles
  • Family-recipe archive with semantic search
In the enterprise
  • Customer support over product docs (with citation links)
  • Legal-discovery assistant scoped to one matter
  • Internal HR/policy bot with audit-grade sources
Common pitfalls
  • Chunk size too large → retrieval is noisy
  • Forgetting to dedupe near-duplicate chunks
  • Trusting cosine similarity without re-ranking
03
Concept 03

Tools, Function Calling & MCP

Tools turn an LLM from a talker into a doer. MCP is becoming the standard wire-format for exposing them.

Beginner — the intuition

A 'tool' is just a function the model can choose to call. You describe it (name, params, what it does) in JSON. The model decides when to call it, you actually run it, and feed the result back. That's how agents check the weather, send emails, or query a database.

Advanced — the gotchas

Design tools to be idempotent and side-effect-explicit. Always return structured results (not freeform strings) so downstream agents can parse them. For dangerous tools, gate behind HITL approvals. MCP (Model Context Protocol) standardizes this so the SAME tool server works with Claude Desktop, your custom agent, and any compatible client — like USB-C for AI tools. Avoid mega-tools; prefer many small, composable tools — the model's tool-selection accuracy degrades fast above ~15 tools, so use a router agent to gate which tools are visible per turn.

Worked example — OpenAI-style tool definition
{
  "type": "function",
  "function": {
    "name": "get_weather",
    "description": "Get current weather for a city.",
    "parameters": {
      "type": "object",
      "properties": {
        "city":  { "type": "string", "description": "e.g. 'Berlin'" },
        "units": { "type": "string", "enum": ["c", "f"], "default": "c" }
      },
      "required": ["city"]
    }
  }
}
In real life
  • Calendar agent that books appointments
  • Smart-home agent that dims lights on movie night
  • Personal CRM that updates contacts after every call
In the enterprise
  • Salesforce / Jira / ServiceNow automation
  • Internal MCP server fronting your data warehouse
  • Approval-gated refunds, deletes, money movement
Common pitfalls
  • Vague descriptions → model picks the wrong tool
  • Letting the model see 50 tools at once
  • No timeouts → hung tool calls eat budget
04
Concept 04

Guardrails & Human-in-the-Loop

Production agents need brakes. Filters at the input, schemas at the output, humans for the scary stuff.

Beginner — the intuition

A guardrail is anything that says 'no' or 'wait'. Examples: redact emails before sending to the model (input filter), refuse to return a SQL DROP statement (output filter), or pause and ask a human before refunding $10,000 (HITL approval). They keep your agent safe and your users (and lawyers) happy.

Advanced — the gotchas

Layer guardrails: input validation → prompt-injection defense → output schema validation → policy classifier → HITL approval gate for high-risk actions. Treat prompt-injection as inevitable, not preventable; design tools so the worst-case unauthorized call is recoverable. For HITL, design for fast async approvals (Slack/email) rather than blocking tool calls — agents that wait too long get killed. Track approval latency as a first-class metric.

Worked example — HITL approval pattern
async function refundCustomer(args: RefundArgs) {
  // 1. Check policy
  if (args.amount > 1000) {
    const approval = await approvals.create({
      action_title: `Refund $${args.amount} to ${args.customerId}`,
      action_type:  "refund",
      risk_level:   "high",
      payload:      args,
    });
    return { status: "pending_approval", id: approval.id };
  }
  // 2. Auto-approve small refunds
  return stripe.refunds.create(args);
}
In real life
  • Email-drafting agent that pauses before SENDING
  • Smart-home agent that asks before unlocking the door
  • Trading bot that won't execute over $X without you
In the enterprise
  • PII redaction for GDPR/HIPAA compliance
  • SOC2-compliant approval workflows
  • Policy-as-code with OPA / Cedar integration
Common pitfalls
  • Trusting model self-policing ('please don't do X')
  • Approval queues that take days → agents abandoned
  • No rollback path when a guardrail fires mid-flow
05
Concept 05

Multi-Agent Swarms

One agent is a worker. A swarm is a team. Routers delegate, workers specialize, reviewers verify.

Beginner — the intuition

Imagine a research project. You wouldn't ask one person to find sources, write the report, AND fact-check it. A swarm splits those jobs: a Researcher agent finds info, a Writer agent drafts, a Reviewer agent checks. Each one is simpler and better at its job. They pass messages between each other.

Advanced — the gotchas

Two dominant patterns: (1) Orchestrator-workers — a central router decides who works next, gives clean handoffs, easy to trace; (2) Peer-to-peer — agents broadcast and self-organize, more emergent but harder to debug. Start with orchestrator. Use shared scratchpad memory (a typed object) for state between handoffs rather than stuffing prior messages. Watch for cascading hallucinations: a downstream agent treating an upstream agent's guess as fact. Mitigate with structured outputs + verifier agents on critical paths.

Worked example — Researcher → Writer → Reviewer pattern
// Orchestrator pseudocode
const research = await researcher.run({ topic });
//  research = { sources: [...], notes: "..." }

const draft = await writer.run({ research });
//  draft = { markdown: "...", citations: [...] }

const review = await reviewer.run({ draft, sources: research.sources });
//  review = { approved: bool, issues: [...] }

if (!review.approved) {
  return writer.run({ research, feedback: review.issues });
}
return draft;
In real life
  • Trip planner: search → budget → itinerary
  • Newsletter swarm: scout → write → fact-check
  • Job hunt: scrape jobs → tailor resume → cover letter
In the enterprise
  • Underwriting pipeline: extract → score → review
  • RFP response: parse → draft → legal review → format
  • Multi-step ops automation with HITL approvals
Common pitfalls
  • Splitting too early — 1 good agent beats 3 confused ones
  • Loose handoffs (free text instead of typed objects)
  • No global timeout → infinite agent ping-pong
06
Concept 06

Observability & Evals

If you can't trace it, you can't trust it. If you can't eval it, you can't ship it.

Beginner — the intuition

Every agent run produces a 'trace' — the prompt, the response, tokens used, tools called, cost, latency. Looking at traces is how you debug. 'Evals' are little tests: did the answer cite the right doc? Was it under 200 words? Did it refuse the bad request? Run evals on every change so you don't break things.

Advanced — the gotchas

Evals come in three flavors: (1) deterministic checks (regex, JSON schema, citation presence), (2) LLM-as-judge (cheap, noisy — always sample-validate against humans), (3) human-graded golden sets (gold standard, expensive). Build all three. Track regressions per-prompt-version, per-model. Cost & latency are first-class quality metrics — a correct answer that costs $5 and takes 30s is a bug. Wire traces into your existing observability stack (OpenTelemetry → Datadog/Honeycomb).

Worked example — A tiny eval suite
const cases = [
  { q: "What is our refund policy?",
    must_cite: "policies/refunds.md",
    must_not: ["I don't know", "as an AI"] },
  { q: "Cancel my account",
    expect_tool: "create_approval",
    expect_risk: "high" },
];

for (const c of cases) {
  const trace = await runAgent(c.q);
  assert(trace.citations.includes(c.must_cite));
  for (const phrase of c.must_not ?? [])
    assert(!trace.response.includes(phrase));
}
In real life
  • Catch your tutor when it goes off-topic
  • Track which prompts burn the most credits
In the enterprise
  • SLA monitoring on agent latency
  • Cost attribution per team / customer
  • Audit trails for SOC2 / HIPAA / GDPR
Common pitfalls
  • 'Vibes-based' evals → silent regressions
  • Logging without redaction → PII leak
  • Tracking accuracy but ignoring cost & latency
Deep dive · Build pathways

Different ways to build agents — open-source frameworks compared

Once you understand the building blocks (prompt → RAG → tools → guardrails → swarms), the next question is "what do I actually use to build this?" There are four broad pathways. Pick by your team's skills and how much control you need — not by hype.

Hand-rolled (no framework)

You want to truly understand what's happening, or you have one simple use case.

Pros
  • Zero dependencies
  • Full control of every prompt + token
  • Easy to debug
Cons
  • You re-invent retries, tool routing, tracing, memory
  • Hard to scale beyond 1–2 agents

Code-first framework (LangChain, LlamaIndex, AutoGen, Pydantic AI)

You're a developer shipping production agents with custom logic.

Pros
  • Reusable abstractions
  • Big ecosystem of tools + integrations
  • Version-controlled in git
Cons
  • Learning curve
  • Abstractions can hide the prompt
  • Frequent breaking changes

Visual / no-code (n8n, Flowise, Langflow, Dify)

You want non-engineers to compose flows, or you need fast internal automations.

Pros
  • Drag-and-drop graphs
  • Great for ops, marketing, support teams
  • Visual debugging
Cons
  • Hits a ceiling on complex logic
  • Harder to test / version-control
  • Vendor lock-in for hosted ones

AgentSwarms (this platform)

You want the visual benefits + a real backend + open-source export — without giving up code.

Pros
  • Visual swarm builder backed by a typed runtime
  • BYO model: OpenAI, Gemini, Claude, Grok, Qwen, Bedrock, Vertex, OCI, Azure
  • Full traces, costs, evals, and HITL approvals
  • Export any swarm to a portable .swarm.json — no lock-in
Cons
  • Hosted lab (you're not running the runtime yourself, yet)

Side-by-side: the major open-source frameworks

All of these are free and open-source. Most are Python-first, a few have strong JS/TS or .NET stories. None of them are "best" — they're optimized for different jobs.

FrameworkLanguageBest forWho typically uses it
LangChain / LangGraph
The Swiss army knife. Chains, agents, and a graph runtime.
Python · JS/TSRapid prototyping, RAG pipelines, multi-step graphs with explicit state.Teams shipping production RAG + multi-agent workflows.
LlamaIndex
RAG-first framework. Data → index → query, batteries included.
Python · TSAnything where retrieval quality is the #1 metric.Doc-QA, knowledge assistants, research copilots.
CrewAI
Role-based crews. 'A team of agents with jobs and a boss.'
PythonMulti-agent collaboration with clear roles and tasks.Content ops, research swarms, marketing automations.
AutoGen (Microsoft)
Conversational multi-agent framework with code-execution.
Python · .NETAgents that talk to each other and write/run code.R&D, code-generation pipelines, complex task decomposition.
OpenAI Agents SDK
Lightweight, opinionated. Built around handoffs + guardrails.
Python · JSProduction agents on OpenAI/compatible models with minimal magic.Teams that already standardised on OpenAI/Azure OpenAI.
Pydantic AI
Type-safe agents for the FastAPI generation.
PythonBackend devs who want validated I/O and dependency injection.Production backends that already use FastAPI/Pydantic.
Haystack (deepset)
Production search + RAG pipelines, pipeline-graph first.
PythonEnterprise search, hybrid retrieval, document Q&A at scale.Enterprises building internal search & QA systems.
Semantic Kernel (Microsoft)
Agent framework for .NET / Java / Python with planners.
C# · Python · JavaEnterprise .NET/Java shops integrating LLMs into existing apps.Microsoft-stack enterprises adopting AI features.

LangChain / LangGraph

Python · JS/TS

The Swiss army knife. Chains, agents, and a graph runtime.

Strengths
  • Huge ecosystem of integrations (200+ vector stores, models, tools)
  • LangGraph adds a real state machine with checkpoints + HITL
  • First-class observability via LangSmith
Trade-offs
  • Heavy abstractions can hide what the LLM actually sees
  • Frequent breaking changes — pin versions
  • Easy to over-engineer simple chatbots

LlamaIndex

Python · TS

RAG-first framework. Data → index → query, batteries included.

Strengths
  • Best-in-class document loaders, parsers, and indexing strategies
  • Advanced retrieval: hybrid, recursive, sub-question, agentic
  • Workflows API for event-driven multi-agent flows
Trade-offs
  • Less batteries for non-RAG agent patterns
  • API surface is large and evolving

CrewAI

Python

Role-based crews. 'A team of agents with jobs and a boss.'

Strengths
  • Intuitive: Agent + Task + Crew is easy to teach
  • Sequential and hierarchical processes out of the box
  • Plays nicely with LangChain tools
Trade-offs
  • Less control than building the orchestration yourself
  • Fewer production-grade observability hooks

AutoGen (Microsoft)

Python · .NET

Conversational multi-agent framework with code-execution.

Strengths
  • Strong multi-agent chat patterns (group chat, nested chat)
  • Built-in code executor and human proxy agent for HITL
  • Backed by Microsoft Research
Trade-offs
  • Free-form chat handoffs can be hard to debug at scale
  • Steeper learning curve than CrewAI

OpenAI Agents SDK

Python · JS

Lightweight, opinionated. Built around handoffs + guardrails.

Strengths
  • Tiny API surface — handoffs, guardrails, tracing
  • Native streaming + structured outputs
  • Excellent default tracing UI
Trade-offs
  • Tighter coupling to OpenAI Responses API
  • Smaller ecosystem than LangChain

Pydantic AI

Python

Type-safe agents for the FastAPI generation.

Strengths
  • Pydantic everywhere — inputs, outputs, tool schemas
  • Model-agnostic (OpenAI, Anthropic, Gemini, local)
  • Great DX for testing and mocking
Trade-offs
  • Younger ecosystem; fewer pre-built integrations
  • Python-only today

Haystack (deepset)

Python

Production search + RAG pipelines, pipeline-graph first.

Strengths
  • Pipeline graphs are explicit and serializable (YAML)
  • Strong on hybrid search, evals, and deployment
  • Mature, used in regulated industries
Trade-offs
  • Less focus on free-form 'agentic' loops
  • Heavier than CrewAI for small projects

Semantic Kernel (Microsoft)

C# · Python · Java

Agent framework for .NET / Java / Python with planners.

Strengths
  • First-class .NET and Java support — rare in this space
  • Plugins, planners, and memory abstractions
  • Tight Azure integration
Trade-offs
  • Smaller community vs Python-first frameworks
  • Concepts (planners, plugins) take time to click
In this platform

How AgentSwarms builds agents

AgentSwarms is the visual + code-friendly middle ground. Under the hood, every agent is a row in a database with a system prompt, a model, optional tools, and an optional knowledge base. Every swarm is a typed graph of those agents with routed handoffs. Nothing proprietary — you can export and run it elsewhere.

A single agent

Go to Agents → New Agent. Pick a provider (Lovable AI, OpenAI, Gemini, Anthropic, Grok, Bedrock, Vertex, OCI, Qwen, Azure), choose a model, write a system prompt, attach a knowledge base, enable tools, set spend caps. That's it — your agent is callable from the Playground.

A multi-agent swarm

Go to Swarms → New Swarm. Drag agent nodes, router nodes, guardrail nodes, and tool nodes onto the canvas. Wire them with edges (the typed handoffs). Hit Run to stream traces live, or Export to get a portable.swarm.jsonyou can import into another instance.

Worked example — the AgentSwarms portable schema
{
  "schemaVersion": "1.0.0",
  "name": "Research Swarm",
  "nodes": [
    {
      "id": "researcher",
      "type": "agent",
      "agent": {
        "provider": "openai",
        "model": "gpt-5",
        "systemPrompt": "You find sources and return JSON.",
        "tools": ["search_web", "fetch_url"]
      }
    },
    {
      "id": "writer",
      "type": "agent",
      "agent": { "provider": "anthropic", "model": "claude-3.7", ... }
    },
    { "id": "reviewer", "type": "agent", "agent": { ... } }
  ],
  "edges": [
    { "from": "researcher", "to": "writer" },
    { "from": "writer",     "to": "reviewer" }
  ]
}

Because every swarm exports to this schema, anything you build here can be re-implemented in LangGraph, CrewAI, or hand-rolled code in an afternoon. No lock-in.

Deep dive · Tools

Tools — the deep dive

Concept 03 introduced tools. This section goes one level deeper: the categories of tools you'll actually build, the lifecycle of a single tool call, and how to design tools that don't blow up in production.

The 6 categories of agent tools

Every tool you'll ever build falls into one of these buckets. Knowing the bucket tells you how to design it (idempotent? gated? cached?) and how risky it is.

Information / Retrieval tools

Read-only tools that fetch facts the model doesn't have.

search_webfetch_urlquery_knowledge_baseget_weatherlookup_user

Why it matters: Cuts hallucinations. The model stops guessing and starts citing.

Action tools (write / mutate)

Tools that change state in another system.

send_emailcreate_ticketupdate_crm_recordissue_refunddeploy_service

Why it matters: Turn the agent from advisor into operator. Always gate dangerous ones with HITL.

Computation tools

Deterministic helpers that LLMs are bad at on their own.

calculatorrun_sqlexecute_pythonconvert_unitsparse_pdf

Why it matters: Math, code, and parsing are deterministic — never trust an LLM to do them in its head.

Memory tools

Read/write the agent's long-term store.

save_factrecall_factupdate_user_preferencelist_recent_conversations

Why it matters: Lets agents learn across sessions instead of starting from zero each time.

Handoff / orchestration tools

Tools that route work to another agent.

transfer_to_specialistask_reviewer_agentspawn_sub_swarm

Why it matters: The wiring of multi-agent swarms — a handoff is just a tool call under the hood.

Human-in-the-loop tools

Tools that pause the agent and wait for a human decision.

request_approvalask_user_confirmationescalate_to_oncall

Why it matters: Your safety net for irreversible or high-cost actions.

The lifecycle of a single tool call

A "tool call" is not just function(args). It's a six-step round-trip between the model and your runtime. Skip a step and you'll ship bugs that look like LLM hallucinations but are actually plumbing.

  1. 1Step 1

    Describe

    You define the tool's name, params, and a one-sentence description. The model only sees this — make it crisp.

  2. 2Step 2

    Expose

    The runtime sends the tool list with every model call. Keep the list small (<15) per turn for best accuracy.

  3. 3Step 3

    Decide

    The model emits a tool_call with structured arguments — no execution yet, just intent.

  4. 4Step 4

    Validate

    Your runtime validates args (schema, policy, budget, HITL gate) before doing anything.

  5. 5Step 5

    Execute

    Run the tool. Apply timeouts, retries, and observability. Capture cost + latency.

  6. 6Step 6

    Return

    Send a structured tool_result back to the model. It plans the next step or replies to the user.

Worked example — a well-described tool
{
  "name": "issue_refund",
  "description": "Refund a customer order. Use ONLY when the user explicitly asks for a refund and you have an order_id. Refunds over $100 require human approval.",
  "parameters": {
    "type": "object",
    "properties": {
      "order_id":  { "type": "string", "description": "The internal order id, e.g. 'ord_123'" },
      "amount":    { "type": "number", "description": "Refund amount in USD" },
      "reason":    { "type": "string", "enum": ["damaged", "wrong_item", "late", "other"] }
    },
    "required": ["order_id", "amount", "reason"]
  }
}
  • Tip: Encode policy in the description ("over $100 requires approval") — the model will route correctly.
  • Tip: Use enum on free-form fields (like reason) so the model returns a clean value you can switch on.
  • Tip: Make tool results structured, not freeform — downstream agents can parse them.
In this platform

How AgentSwarms uses tools

In AgentSwarms, tools are first-class objects. You attach them to an agent, the runtime validates calls, executes them with tracing, and routes the structured result back to the model — exactly the lifecycle described above.

Knowledge bases

Attach any KB you upload (PDF, DOCX, Markdown, raw text). The runtime exposes it as a query_knowledge_base tool with citations.

MCP servers

Connect any Model Context Protocol server (HTTP or stdio). Every tool the MCP server advertises shows up in your agent's tool palette automatically.

n8n / webhook tools

Point the agent at any n8n workflow or HTTPS webhook. Great for connecting to Slack, Notion, Stripe, Salesforce — anything with an API.

Provider integrations

OpenAI, Gemini, Anthropic, Grok, Bedrock, Vertex, OCI, Qwen, Azure — bring your own keys, or use Lovable AI with no key required.

Handoff edges (in swarms)

Wiring two nodes in the swarm canvas IS a handoff tool. The router agent calls transfer_to_<node> under the hood.

HITL approvals

Mark any action as requiring approval. The agent calls request_approval, the run pauses, and the request shows up in the Approvals inbox.

The mental model

Every tool in AgentSwarms — whether it's a KB lookup, an MCP call, an n8n webhook, a swarm handoff, or an approval request — flows through the same 6-step lifecycle. Same tracing. Same cost accounting. Same guardrails. That uniformity is what lets you debug a 50-step swarm run as easily as a single tool call.

Reference

Glossary — the agentic AI vocabulary

Agent
An LLM with a system prompt, optional tools, and memory — capable of multi-step reasoning toward a goal.
RAG
Retrieval-Augmented Generation. Inject relevant chunks from your docs into the prompt so the model can cite real sources.
Tool / Function call
A typed action the model can invoke (search_web, send_email, query_db). The agent decides when to call it.
Guardrail
Rules that filter input or output — PII redaction, profanity blocks, schema validation, cost caps.
HITL
Human-in-the-Loop. Agent pauses for human approval before doing something risky.
MCP
Model Context Protocol. A standard way to expose tools and data to any compatible agent.
Swarm
Multiple specialized agents that hand off work to each other.
Eval
A test suite for agents. Score outputs on accuracy, format, safety, cost — not just vibes.
Embedding
A numeric vector representation of text. Similar meanings → similar vectors.
Vector store
A database that indexes embeddings for fast similarity search (Pinecone, Weaviate, pgvector).
Token
A chunk of text the model reads/writes. ~4 chars in English. You pay per token.
Temperature
0 = deterministic, 1 = creative. Lower for facts, higher for brainstorming.
Few-shot
Including examples of input→output pairs in the prompt to shape behavior.
Chain-of-thought
Asking the model to reason step-by-step before answering. Improves hard tasks, costs more tokens.
Prompt injection
User input that tries to override the system prompt. Treat as inevitable; design tools defensively.
LLM-as-judge
Using one LLM to grade another's output. Cheap eval, but bias-prone.

Reading is good. Building is better.

Open the lab, pick a template, and apply what you just read. Every in-app page has a side-rail explaining the concept you're touching — so you keep learning as you build.