Telemetry
Vystak deployments emit OpenTelemetry traces and metrics when telemetry is configured on Platform. Channels and agents are pre-instrumented; trace context (W3C traceparent) propagates across HTTP and NATS so a single user message produces one connected trace from the entry channel through every agent it touches. Token-usage metrics from each model call ship to the same endpoint.
Turning it on
Add telemetry=... to Platform. With Docker, the simplest form is enough:
import vystak as ast
docker = ast.Provider(name="docker", type="docker")
platform = ast.Platform(
name="local",
type="docker",
provider=docker,
telemetry=ast.Telemetry(),
)
When telemetry is enabled and no endpoint is set, the Docker provider auto-provisions a grafana/otel-lgtm container on vystak-net. It bundles Tempo (traces) + Mimir (metrics) + Grafana (UI) behind a single OTLP gRPC receiver, so traces and metrics land in one place and you browse them through one UI at http://localhost:13000.
To send to your own collector instead:
telemetry=ast.Telemetry(endpoint="http://my-collector:4317")
To turn it off entirely, omit the field — agents and channels skip OTel init and pay no instrumentation cost.
What gets instrumented
- FastAPI — every inbound HTTP request (e.g. an agent's
/a2aendpoint) becomes a server span automatically. - httpx — outbound HTTP from agents and channels (subagent calls, model calls) becomes a client span automatically and injects
traceparent. - NATS path —
traceparentis injected into the JSON-RPC envelope'sparams._meta.headerson publish, and the NATS↔HTTP bridge inside the receiving agent extracts it and starts the agent-side span as a child of the caller's. - Slack channel — the Socket Mode handler doesn't go through FastAPI, so the channel manually wraps each
messageandapp_mentionevent in a root span (slack.message/slack.app_mention). Without this, traces from Slack would be disconnected. - GenAI calls — a LangChain callback handler opens a
gen_ai.chatspan around every model call, attaches token-usage attributes when the LLM responds, and emits OTel histogram metrics (see below). Both follow the OTel GenAI semantic conventions.
The result: a single Slack @bot mention that the coordinator agent forwards to two specialist agents shows up as one trace with ~150–200 spans across four services, plus one or more histogram data points per model call on the metrics side.
Token-usage metrics
Every chat-model call publishes:
| Instrument | Type | Description |
|---|---|---|
gen_ai.client.token.usage | Histogram | Tokens consumed per call. One data point per direction. |
gen_ai.client.operation.duration | Histogram | Wall-clock time for the model call (seconds). |
The token histogram is broken down by gen_ai.token.type ∈ {input, output, cache_read, cache_creation}. Other attributes on every metric:
| Attribute | Example | Source |
|---|---|---|
gen_ai.system | anthropic | Default for Vystak's anthropic-compat endpoints |
gen_ai.request.model | claude-haiku-4-5-20251001 | LangChain response_metadata.model |
service.name | vystak-assistant-agent | Resource attribute set on the MeterProvider |
The same numbers are also stamped on a dedicated gen_ai.chat span (SpanKind.CLIENT) that the runtime opens around every model call and closes when the LLM responds. So in Tempo (or any tracing UI) each model call shows up as one span carrying:
gen_ai.usage.input_tokensgen_ai.usage.output_tokensgen_ai.usage.cache_read_input_tokensgen_ai.usage.cache_creation_input_tokensgen_ai.request.model,gen_ai.system
The span is owned end-to-end on purpose: a2a-sdk dispatches LangGraph in a background task that outlives the FastAPI request, so the agent's server span often closes before the model call finishes. Stamping attributes on a span the runtime owns guarantees they're always visible per call.
In Grafana, query the metrics with PromQL:
# Total input tokens across all agents in the last hour
sum(rate(gen_ai_client_token_usage_sum{gen_ai_token_type="input"}[1h]))
# Cache hit ratio per agent
sum by (service_name) (rate(gen_ai_client_token_usage_sum{gen_ai_token_type="cache_read"}[5m]))
/
sum by (service_name) (rate(gen_ai_client_token_usage_sum{gen_ai_token_type="input"}[5m]))
For per-call drill-in, query Tempo by span name gen_ai.chat — that's one row per LLM call, with the input/output/cache_read counts in the span attributes panel.
Service naming
Each container reports under a deterministic service.name:
| Component | service.name |
|---|---|
| Agent | vystak-{agent-name} |
| Chat channel | vystak-channel-chat |
| Slack channel | vystak-channel-slack |
| Discord channel | vystak-channel-discord |
So a Tempo query like "Service: vystak-channel-slack, Operation: slack.app_mention" gives you the entry point of every Slack-triggered turn.
Suppressed noise
The bundled telemetry init registers a SpanProcessor that downgrades known a2a-sdk control-flow exceptions (currently culsans.QueueShutDown on a2a.server.events.event_queue_v2.* paths) from ERROR to UNSET. The a2a-sdk uses these as end-of-stream signals, catches them internally, and does not propagate them — but its @trace_function decorator stamps the span ERROR before the catch fires. Without the suppressor, every successful turn would surface 3+ red spans that don't represent failures.
Real exceptions on the same code path (ValueError, RuntimeError, etc.) still surface as ERROR. A span with both a benign and a real exception stays ERROR.
Disabling telemetry per-environment
Set the env var OTEL_EXPORTER_OTLP_ENDPOINT to empty in a specific environment to disable export at runtime without changing your vystak.py:
OTEL_EXPORTER_OTLP_ENDPOINT= vystak apply
The agent and channel containers detect the unset endpoint and skip OTel init entirely.