- Agent
- An LLM with a system prompt, optional tools, and memory — capable of multi-step reasoning toward a goal.
- RAG
- Retrieval-Augmented Generation. Inject relevant chunks from your docs into the prompt so the model can cite real sources.
- Tool / Function call
- A typed action the model can invoke (search_web, send_email, query_db). The agent decides when to call it.
- Guardrail
- Rules that filter input or output — PII redaction, profanity blocks, schema validation, cost caps.
- HITL
- Human-in-the-Loop. Agent pauses for human approval before doing something risky.
- MCP
- Model Context Protocol. A standard way to expose tools and data to any compatible agent.
- Swarm
- Multiple specialized agents that hand off work to each other.
- Eval
- A test suite for agents. Score outputs on accuracy, format, safety, cost — not just vibes.
- Embedding
- A numeric vector representation of text. Similar meanings → similar vectors.
- Vector store
- A database that indexes embeddings for fast similarity search (Pinecone, Weaviate, pgvector).
- Token
- A chunk of text the model reads/writes. ~4 chars in English. You pay per token.
- Temperature
- 0 = deterministic, 1 = creative. Lower for facts, higher for brainstorming.
- Few-shot
- Including examples of input→output pairs in the prompt to shape behavior.
- Chain-of-thought
- Asking the model to reason step-by-step before answering. Improves hard tasks, costs more tokens.
- Prompt injection
- User input that tries to override the system prompt. Treat as inevitable; design tools defensively.
- LLM-as-judge
- Using one LLM to grade another's output. Cheap eval, but bias-prone.
- SQL agent (text-to-SQL)
- An agent equipped with a sql_query tool that turns natural-language questions into validated SELECT statements, executes them, and answers in plain English. In AgentSwarms: SELECT-only, AST-parsed, 50-row capped, RLS-isolated.
- Table allow-list
- Per-agent restriction (toolConfigs.sql_table_names) that limits which tables a SQL agent can read. Defense in depth on top of RLS.
- Parameters / weights
- The numbers inside a model that get adjusted during training. More ≠ always better, but capacity scales with them.
- Pre-training
- Initial training on a massive general corpus to build a base model that 'knows language' but not how to follow instructions.
- Fine-tuning
- Continued training on a smaller, curated dataset to specialize the model for a task, format, or domain.
- LoRA / QLoRA
- Parameter-efficient fine-tuning: train tiny adapter matrices instead of all weights. 10–100× cheaper, swappable per use case.
- SFT
- Supervised Fine-Tuning. Teach a model with (input, ideal output) pairs.
- RLHF / DPO
- Reinforcement Learning from Human Feedback / Direct Preference Optimization. Align a model to human preferences with chosen/rejected pairs.
- Distillation
- Train a small 'student' model to mimic a big 'teacher' model on a task. The standard way to make cheaper, faster specialists.
- SLM
- Small Language Model — typically 1B–14B params. Runs on a laptop or phone, often great for narrow tasks.
- VLM
- Vision-Language Model. Takes images alongside text. Examples: GPT-5 vision, Gemini, Claude with vision, Qwen-VL.
- Embedding model
- Maps text to a vector. Similar meanings → nearby vectors. The engine of RAG.
- Re-ranker
- Given a query + candidate doc, scores precise relevance. Slower than embeddings, far more accurate. Highest-ROI RAG upgrade.
- Reasoning model
- An LLM trained to generate a long internal chain-of-thought before answering. Better on hard problems, slower & costlier.
- ReAct
- Reason + Act prompting pattern: Thought → Action (tool) → Observation → Thought… The default loop for tool-using agents.
- Self-consistency
- Run chain-of-thought multiple times, take the majority answer. Trades cost for accuracy.
- Speculative decoding
- Inference trick: a tiny draft model proposes tokens, the big model verifies. Same outputs, often 2–3× faster.
- Catastrophic forgetting
- When fine-tuning makes a model lose general capabilities it used to have. Mitigated with mixed data and small-step training.
- Skill
- A reusable, structured markdown playbook (when-to-use + steps + constraints) attached to an agent. Composable; multiple skills can stack.
- System prompt
- The agent's persistent identity, tone, and hard rules — set once, always loaded. Skills cover situational know-how on top.
- Agent loop
- The perceive → reason → act → observe → repeat cycle that makes an LLM 'agentic.' Terminates when the task is done or a limit is hit.
- Context window
- The maximum number of tokens a model can read + write in one API call. Input, output, and system prompt share this budget.
- Token
- The atomic unit models process — roughly ¾ of a word. All costs and limits are measured in tokens.
- BPE (Byte-Pair Encoding)
- The tokenization algorithm used by GPT, Claude, and most modern LLMs. Splits text into subword units based on frequency.
- Vector database
- A store optimized for fast approximate nearest-neighbor search over embedding vectors. Powers semantic search and RAG.
- HNSW
- Hierarchical Navigable Small World — the most common ANN index algorithm. O(log n) search with high recall.
- Function calling
- A protocol where the model returns a structured tool_call instead of text, the runtime executes it, and the result feeds back into the conversation.
- MCP (Model Context Protocol)
- An open standard for exposing tools to LLMs. Write one server, any MCP-compatible agent can discover and call its tools.
- Cosine similarity
- Measures the angle between two vectors. 1.0 = identical direction, 0 = orthogonal, -1 = opposite. The standard metric for embedding search.
- SLO / SLA
- Service Level Objective / Agreement — measurable promises about latency, uptime, and quality.
- p95 / p99
- The latency the slowest 5% (or 1%) of users see. The number that actually matters at scale.
- Circuit breaker
- Auto-stops calls to a failing dependency for a cooldown so you don't make things worse.
- Bulkhead
- Resource isolation so one noisy tenant can't starve everyone else (separate pools/queues).
- Canary deploy
- Roll a change to 1–5% of traffic first; monitor; then expand.
- Shadow traffic
- Run the new version in parallel without showing its output to users; compare offline.
- HITL
- Human-in-the-loop — a human approves a step before the agent proceeds (e.g. send the email).
- Blast radius
- How much damage a single failed action can cause (read-only vs. send-money).
- Game day
- Planned exercise where you intentionally break parts of the system to test resiliency.
- Model gateway
- A proxy in front of multiple LLM providers for routing, fallback, caching, logging.
- Drift
- Slow degradation in model quality over time — same prompt, gradually worse outputs.
- Eval gate
- A CI step that blocks deploy if the prompt/model/tool change regresses the eval suite.