Logs, traces and tool-calling in the playground
The single most important debugging surface on AgentSwarms is the trace produced by every run, and the playground exposes that trace immediately alongside the chat conversation rather than burying it behind a separate analytics screen. Every assistant turn has an "Inspect" button that opens the trace for that turn in a side panel; from there you can see exactly what the model decided at every step, why it decided it, what it called, how much it cost, and how long it took. Spending time in the trace inspector is the fastest way to develop a real intuition for how agents behave.
What a trace contains
Each trace is a full, time-ordered record of everything that happened during a single run. The trace inspector renders this record as an expandable tree with the most important information visible at every level. The list below is in the order the items appear in a typical trace, which mirrors the order in which the runtime actually executed them.
- Resolved system prompt is the system prompt as the model actually saw it, after long-term memory summaries, knowledge-base context, and any template variables have been merged in. The "resolved" qualifier matters because the prompt the model sees is rarely the exact text you typed into the Agent Builder; understanding the diff between the two is often the key to debugging unexpected behaviour.
- Tool catalogue is the full list of tools (with their descriptions and input schemas) that the runtime sent to the model that turn. If a tool you expected the model to call is missing from this list, the bug is in the agent configuration, not in the model's reasoning.
- Each tool call appears as its own entry with the tool name, the JSON arguments the model chose, the raw response from the tool, and the latency in milliseconds. Errors during tool calls (timeouts, validation failures, upstream HTTP errors) are highlighted in red and the trace continues so you can see what the agent did about it.
- Token-level cost breakdown shows the input tokens, output tokens, and any cached-token tokens used by that call, multiplied by the live per-token prices from the model registry to produce a dollar cost. The trace's total cost is the sum of the per-call costs and lets you see at a glance which calls in a long run were the expensive ones.
- Model finish reason tells you why the model stopped generating: a normal stop, a tool-call request that the runtime now needs to handle, a length cap hit, or a content-safety filter triggered. Each finish reason has different implications for what the runtime does next.
- Guardrail decisions are logged whenever an input filter, output filter, or PII blocker fired. For each decision the trace shows what was matched, what was rewritten or refused, and whether the guardrail required a human approval that has now been posted to your dashboard inbox.
- Memory writes are visible as separate entries showing what facts the long-term-memory extractor decided to persist from this run. Reviewing memory writes periodically is one of the easiest ways to catch a regression where the agent has started persisting noisy or irrelevant facts.
- Errors and retries capture any internal error, any retry the runtime performed, and any fallback- model trigger. The fallback-model entry explicitly names the primary model, the reason for the fall back (rate limit, error, safety refusal), and the fallback model that actually answered.
Verbose and step-through modes
Two toggles at the top of the trace inspector unlock deeper inspection for the cases where the standard view is not enough. Both are off by default because they produce a lot of output and are only useful in narrow circumstances, but in those circumstances they are the difference between a productive debugging session and a frustrated one.
- Verbose mode shows the raw provider request and response payloads, including every header, every parameter, and the full unredacted message array sent to the model. This is the mode to use when you suspect a bug at the provider integration layer — wrong parameter encoding, an unexpected role marker, a stop sequence interfering with output.
- Step-through mode pauses the run before every tool call and asks for explicit human approval to continue. This is the mode to use when you are trying to understand exactly which tool a runaway agent is calling and why, without letting the agent actually take any of the destructive actions in question.
Diffing two traces
Whenever you change one thing about an agent — a single line in the system prompt, the temperature, the model — the recommended discipline is to keep the old trace open and diff it against the new trace produced by the same input. The trace inspector has a "compare with" picker that side- by-sides two traces and highlights the differences at every level: a different tool was called, the same tool was called with different arguments, the output token count grew by an order of magnitude. This habit is the closest thing prompt engineering has to a scientific method and it is by far the most reliable way to know whether a change was actually an improvement.