Run & observe

Logs & traces

Every run on the platform — playground chats, swarm nodes, notebook calls — is recorded as a trace. Reading traces is the core debugging skill in agentic systems, and the one course environments almost never let you practice.

The traces table

/traces lists every run with the agent name, provider and model, latency, tokens in and out, dollar cost, status, and timestamp. Sort or scan for the rows that look wrong — the red statuses, the latency outliers, the runs that cost ten times their neighbours.

What a trace contains

Selecting a run opens the full record:

Metrics: Latency, tokens in, tokens out, and cost computed from model pricing.
Prompt: The prompt for the run, with the agent and model that handled it.
Tool calls: Each tool the model called, with the arguments it chose and the result it got back. If a tool you expected is never called, check the request payload to confirm it was offered at all.
Request / response payloads: The raw provider request and response. This is the ground truth: the exact message array, parameters, and tool definitions the model actually received, and exactly what it returned.
Error: For failed runs, the error message the runtime captured.

The playground inspector

While chatting in the Playground, the inspector panel shows the same information live, in three tabs: the latest request/response exchange, the stream of tool events as they happen (with a running call count), and the trace for the current conversation. For swarm runs, the observability view adds the per-node timeline.

A debugging method that works

Reproduce, then read. Re-run the failing input, open the trace, and read the request payload before forming a theory. Most "the model is broken" reports turn out to be "the model was sent something other than what I assumed".
Change one thing. Adjust a single line of prompt, one parameter, or the model — then run the same input and compare the two traces. Keeping the old trace open in a second tab is the closest thing prompt engineering has to a scientific method.
Watch cost as a signal. A run whose cost jumps an order of magnitude usually means a loop, a context blow-up, or a tool feeding the model far more text than intended — the trace shows which.

The Failure Modes Lab notebook and the canvas failure labs are guided practice for exactly this skill: each one produces a broken trace and asks you to find the cause.

PreviousChat Playground Next Analytics