Section 10

Analytics and observability

The analytics section at /analytics is the control tower for everything that has ever run on your workspace. It is split into three views — an overview, a trace viewer, and a dedicated swarm observability canvas — that all read from the same underlying telemetry store. Spending time in analytics is one of the highest-leverage things you can do as you start to take any of your agents seriously, because almost every meaningful improvement to a real agentic system is downstream of a specific observation about how the system behaves under real load.

Overview

The overview is the top-level analytics page, designed to be glanceable in fifteen seconds. It shows the health and economics of your workspace at a glance, with the things most likely to require action made the most visible. Every chart can be drilled into for a more detailed view, and every number is clickable to filter the trace viewer down to the runs that produced it.

  • Runs per day is split into agents, swarms, and notebooks and stacked so you can see the composition of your usage over time. A sudden spike usually indicates either an exciting new experiment or a runaway loop, and either is worth understanding within a few minutes of it happening.
  • Success and error ratio shows the percentage of runs in the selected window that completed cleanly versus those that errored. The errored portion is colour-coded by error type so you can immediately see whether you are dealing with provider rate limits, tool timeouts, guardrail refusals, or genuine application errors.
  • p50 and p95 latency are reported per agent, per swarm, and per tool. The p95 line is the one to watch for capacity planning, because the p50 will look fine long after a small fraction of users have started having a miserable experience.
  • Spend is broken down by provider, model, agent, and swarm over the selected window, with both stacked bar and pie views. Switching between groupings is often enough to surface the surprise — usually one agent on one expensive model is responsible for most of the spend.
  • Top failing tools and top slow tools are ranked tables that point you directly at the parts of your system that need attention next. A tool that is both slow and frequently failing is almost always your highest- leverage fix target.
  • Budget burn-down charts your month-to-date spend against your configured cap and projects month-end based on the last seven days of usage. A projection that turns red is your cue to either raise the cap or tighten the agents responsible.

Trace viewer

Every run on the platform — every playground turn, every swarm execution, every notebook cell that hit a model — produces a trace, and the trace viewer is the universal way to inspect them. The viewer is the same component used by the playground inspector covered in the previous section, but at workspace scale rather than per-message scope, with filtering and comparison affordances that only make sense once you have many traces to look through.

  • Timeline view renders every model call, tool call, knowledge-base lookup, and memory operation for a single run as a waterfall, with timing precise enough to spot serial calls that should have been parallel.
  • Per-call dollar cost appears next to every model call in the timeline using live pricing from the model registry, with the trace's total cost summed at the top. Sorting the timeline by cost is often a faster route to optimisation than profiling latency.
  • Full request and response payloads are available behind a "show raw" toggle on every call, with secrets and PII redacted before display. The redaction is server-side so even a user with full trace access cannot recover the original sensitive values.
  • Filtering supports filtering by run status, tool name, model, agent, swarm, time window, and arbitrary tag. Saved filters become shareable URLs you can paste into a Slack thread or a postmortem document.
  • Side-by-side comparison diffs two traces chosen from the list. The diff is structural — same tool called twice, different arguments — rather than textual, which is by far the more useful comparison for traces.

Swarm observability

The dedicated view at /analytics/observability renders a single swarm run as an animated flow diagram that recreates the canvas layout of the swarm with telemetry overlaid on top. This view exists because a multi-agent system's failure modes are spatial — about which node, which edge, which branch — and a flat timeline does not do them justice. The observability canvas is the right place to start any swarm-level troubleshooting.

  • Node colouring reflects status: queued, running, succeeded, errored, or skipped. A glance at the canvas tells you immediately where the run stopped and whether any branches were short-circuited.
  • Edge thickness represents payload size, so you can see at a glance which edges are carrying a lot of data and which are carrying very little — useful for spotting accidentally large payloads being passed between nodes.
  • Edge colour represents latency band: fast edges are cool-coloured, slow edges are warm-coloured. A noticeably warm edge is almost always the consequence of a slow upstream node and a good clue about where to look first.
  • Hover information on every node surfaces the prompt, the tools available, the tokens consumed, the cost, and the upstream payload received. The amount of information per hover is deliberately high because the alternative is clicking through to a separate trace, which breaks the spatial reasoning the canvas is designed to support.
  • Click-through to trace on any node opens the node's per-call trace in the standard trace viewer, for the cases where the hover summary is not enough.
  • Critical-path highlight traces the longest dependency chain through the swarm in a contrasting colour. That path is by definition the swarm's latency floor; any optimisation that does not touch a node on the critical path will not reduce overall latency at all.

Troubleshooting workflows

Most real troubleshooting on AgentSwarms follows one of four patterns, and the analytics section was specifically designed to make each of them short. Internalising these patterns is worth more than reading any number of generic guides, because they are the patterns that actually come up in practice.

  • "A run failed." Open the trace, scroll to the first red entry, inspect the tool arguments and the tool response that triggered the failure, and decide whether the bug is in the tool, in the agent's prompt, or in the model's reasoning. The fix is almost always within two metres of that red entry.
  • "A swarm is slow." Open the swarm observability view, look at the critical-path highlight, and ask whether the slow node could be parallelised, replaced with a faster model, or removed entirely. Most slow swarms have one obvious bottleneck rather than a collection of small inefficiencies.
  • "A run is expensive." Open the trace, sort by cost descending, and inspect the most expensive calls. The culprit is usually an over-long context window being assembled from a verbose tool result that could have been summarised before being fed back to the model.
  • "The agent is hallucinating." Open two traces — one good, one bad — side by side. Compare the resolved system prompts, the tool catalogues, and the knowledge-base context. Differences between the two are your candidate root causes; identical traces with different outputs point at sampling variance and are a signal to lower temperature or add a critic loop.

Cost per model

Pricing on AgentSwarms is never hidden behind aggregate dashboards. Every model call, on every screen that shows model calls, exposes the per-call dollar cost computed from the live Model Registry. The registry is refreshed regularly from the providers' public pricing pages, with manual overrides when a provider's pricing is non-obvious (per-image generation, per-second video, cached- token discounts), and every change to the registry is logged so you can reconstruct what a historical run actually cost.

  • Per-call breakdown shows the input tokens at the input price, the output tokens at the output price, and any cached-token tokens at the cached price for providers that support prompt caching. The three components are summed and shown next to the call in the trace.
  • Per-model comparison on the model registry page lets you pick any two models and see a side-by-side of input price, output price, cached price, context window, tool-calling support, structured-output support, and a quality-tier band. This is the screen to use before changing the model on a high-volume agent.
  • Gateway markup applied through the AgentSwarms AI gateway is shown explicitly when you are using the gateway rather than a bring-your-own key. The markup goes to running the gateway, not to a margin on model inference; we keep it small and transparent on purpose.