Skip to content

Set Up Observability

Omnia includes an optional observability stack with Prometheus, Grafana, Loki, and Tempo for comprehensive monitoring of your agent deployments.

  • Kubernetes cluster with Helm 3.x
  • Omnia Helm chart installed

The observability components are disabled by default. Enable them in your Helm values:

prometheus:
enabled: true
grafana:
enabled: true
loki:
enabled: true
tempo:
enabled: true
alloy:
enabled: true

Install or upgrade with these values:

Terminal window
helm upgrade --install omnia oci://ghcr.io/altairalabs/omnia \
--namespace omnia-system \
--create-namespace \
-f values.yaml

For development, port-forward to access Grafana:

Terminal window
kubectl port-forward svc/omnia-grafana 3000:80 -n omnia-system

Open http://localhost:3000 and log in with:

  • Username: admin
  • Password: admin (change this in production)

If you’ve enabled the internal gateway (with Istio), Grafana is available at /grafana:

Terminal window
kubectl get gateway omnia-internal -n omnia-system -o jsonpath='{.status.addresses[0].value}'

Then access http://<gateway-ip>:8080/grafana/

Omnia agents expose Prometheus metrics automatically on the /metrics endpoint. Metrics are organized into several categories.

Connection and session metrics from the WebSocket facade:

MetricTypeLabelsDescription
omnia_agent_connections_activeGauge-Current WebSocket connections
omnia_agent_connections_totalCounter-Total connections since startup
omnia_agent_requests_inflightGauge-Pending LLM requests
omnia_agent_request_duration_secondsHistogram-Request latency
omnia_agent_messages_received_totalCounter-Messages received
omnia_agent_messages_sent_totalCounter-Messages sent

Token usage and cost metrics from LLM provider calls (via PromptKit SDK collector):

MetricTypeLabelsDescription
omnia_provider_input_tokens_totalCounterprovider, modelInput tokens sent to LLMs
omnia_provider_output_tokens_totalCounterprovider, modelOutput tokens received
omnia_provider_requests_totalCounterprovider, model, statusTotal LLM requests
omnia_provider_cost_totalCounterprovider, modelEstimated cost in USD
omnia_provider_request_duration_secondsHistogramprovider, modelLLM request duration

Detailed metrics for pipelines, stages, tools, and validations:

Pipeline Metrics:

MetricTypeLabelsDescription
omnia_runtime_pipelines_activeGauge-Currently active pipelines
omnia_runtime_pipeline_duration_secondsHistogramstatusPipeline execution duration

Stage Metrics:

MetricTypeLabelsDescription
omnia_runtime_stage_elements_totalCounterstage, statusTotal stage executions
omnia_runtime_stage_duration_secondsHistogramstage, stage_typeStage execution duration

Tool Metrics:

MetricTypeLabelsDescription
omnia_runtime_tool_calls_totalCountertool, statusTotal tool invocations
omnia_runtime_tool_call_duration_secondsHistogramtoolTool execution duration

Validation Metrics:

MetricTypeLabelsDescription
omnia_runtime_validations_totalCountervalidator, validator_type, statusTotal validations
omnia_runtime_validation_duration_secondsHistogramvalidator, validator_typeValidation duration

The eval worker exposes metrics for event processing, eval execution, sampling, and result persistence. These are available when enterprise.evalWorker.enabled is true. The eval worker pod includes Prometheus scrape annotations automatically.

Event Processing:

MetricTypeLabelsDescription
omnia_eval_worker_events_received_totalCounterevent_typeSession events consumed from Redis Streams
omnia_eval_worker_event_processing_duration_secondsHistogramevent_typeEnd-to-end time to process a stream event
omnia_eval_worker_stream_lagGaugestreamPending messages per Redis stream (consumer lag)

Eval Execution:

MetricTypeLabelsDescription
omnia_eval_worker_evals_executed_totalCountereval_type, trigger, statusTotal eval executions (success/error)
omnia_eval_worker_eval_duration_secondsHistogrameval_type, triggerEval execution duration
omnia_eval_worker_evals_sampled_totalCountereval_type, decisionSampling decisions (sampled vs skipped)
omnia_eval_worker_results_written_totalCounterstatusEval results written to session-api

The session-api exposes HTTP request metrics and event publishing metrics. Prometheus scrape annotations are included on the deployment by default.

HTTP Requests:

MetricTypeLabelsDescription
omnia_session_api_requests_totalCountermethod, route, status_codeTotal HTTP requests
omnia_session_api_request_duration_secondsHistogrammethod, route, status_codeHTTP request duration

Event Publishing (requires Redis):

MetricTypeLabelsDescription
omnia_session_api_events_published_totalCounterstatusRedis stream publish attempts (success/error)
omnia_session_api_event_publish_duration_secondsHistogramTime to publish an event to Redis Streams
  1. Open Grafana and go to Explore
  2. Select the Prometheus datasource
  3. Try these queries:
# Active connections
omnia_agent_connections_active
# Request rate
rate(omnia_agent_requests_total[5m])
# P95 request latency
histogram_quantile(0.95, rate(omnia_agent_request_duration_seconds_bucket[5m]))
# LLM cost per model (last hour)
sum by (model) (increase(omnia_provider_cost_total[1h]))
# Token usage rate by provider
sum by (provider) (rate(omnia_provider_input_tokens_total[5m]) + rate(omnia_provider_output_tokens_total[5m]))
# Tool call error rate
sum(rate(omnia_runtime_tool_calls_total{status="error"}[5m])) / sum(rate(omnia_runtime_tool_calls_total[5m]))
# Average pipeline duration
histogram_quantile(0.5, rate(omnia_runtime_pipeline_duration_seconds_bucket[5m]))
# Eval worker: evals per second by type
sum by (eval_type) (rate(omnia_eval_worker_evals_executed_total[5m]))
# Eval worker: eval pass/fail rate
sum(rate(omnia_eval_worker_evals_executed_total{status="error"}[5m])) / sum(rate(omnia_eval_worker_evals_executed_total[5m]))
# Eval worker: consumer lag across all streams
omnia_eval_worker_stream_lag
# Eval worker: P95 eval duration by type
histogram_quantile(0.95, rate(omnia_eval_worker_eval_duration_seconds_bucket[5m]))
# Session API: request rate by route
sum by (route) (rate(omnia_session_api_requests_total[5m]))
# Session API: P99 request latency
histogram_quantile(0.99, rate(omnia_session_api_request_duration_seconds_bucket[5m]))
# Session API: event publish error rate
rate(omnia_session_api_events_published_total{status="error"}[5m])

Logs are collected by Alloy and stored in Loki.

  1. Open Grafana and go to Explore
  2. Select the Loki datasource
  3. Use LogQL queries:
{namespace="omnia-system", container="agent"}
{namespace="omnia-system"} |= "error"
{namespace="omnia-system", app_name="my-agent"}

The runtime container supports OpenTelemetry tracing for detailed visibility into conversations, LLM calls, and tool executions.

Tracing is configured via environment variables on the AgentRuntime. The operator will pass these to the runtime container:

apiVersion: omnia.altairalabs.ai/v1alpha1
kind: AgentRuntime
metadata:
name: my-agent
spec:
# ... other config ...
runtime:
env:
- name: OMNIA_TRACING_ENABLED
value: "true"
- name: OMNIA_TRACING_ENDPOINT
value: "tempo.omnia-system.svc.cluster.local:4317"
- name: OMNIA_TRACING_SAMPLE_RATE
value: "1.0"
- name: OMNIA_TRACING_INSECURE
value: "true"
Environment VariableDescriptionDefault
OMNIA_TRACING_ENABLEDEnable OpenTelemetry tracingfalse
OMNIA_TRACING_ENDPOINTOTLP collector endpoint (gRPC)-
OMNIA_TRACING_SAMPLE_RATESampling rate (0.0 to 1.0)1.0
OMNIA_TRACING_INSECUREDisable TLS for OTLP connectionfalse

The runtime creates three types of spans:

Conversation Spans (conversation.turn)

  • Created for each message exchange
  • Includes session ID, message length, response length
  • Parent span for LLM and tool spans

LLM Spans (llm.call)

  • Created for each LLM API call
  • Includes model name, token counts (input/output), cost

Tool Spans (tool.<name>)

  • Created for each tool execution
  • Includes tool name, success/error status, result size

Traces include rich metadata for debugging:

AttributeDescription
omnia.session_idConversation session identifier
llm.modelLLM model used
llm.input_tokensInput token count
llm.output_tokensOutput token count
llm.cost_usdEstimated cost in USD
tool.nameTool that was called
tool.is_errorWhether tool returned an error
tool.result_sizeSize of tool result

Tempo collects distributed traces from agents.

  1. Open Grafana and go to Explore
  2. Select the Tempo datasource
  3. Search by:
    • Service name (e.g., omnia-runtime-my-agent)
    • Trace ID
    • Duration
    • Tags (e.g., omnia.session_id)

Find slow conversations:

{ duration > 5s && resource.service.name =~ "omnia-runtime.*" }

Find tool errors:

{ span.tool.is_error = true }

Enable persistent storage for production:

prometheus:
server:
persistentVolume:
enabled: true
size: 50Gi
loki:
singleBinary:
persistence:
enabled: true
size: 50Gi
tempo:
persistence:
enabled: true
size: 10Gi
grafana:
adminPassword: your-secure-password

Or use a secret:

grafana:
admin:
existingSecret: grafana-admin-secret
userKey: admin-user
passwordKey: admin-password

The Loki ruler is disabled by default to avoid startup issues on local development environments. For production deployments that need log-based alerting, enable it:

loki:
ruler:
enabled: true
storage:
type: local
local:
directory: /var/loki/rules
alertmanager_url: http://alertmanager:9093
singleBinary:
extraVolumes:
- name: rules
emptyDir: {}
extraVolumeMounts:
- name: rules
mountPath: /var/loki/rules

The ruler allows you to:

  • Alerting rules: Fire alerts based on LogQL queries (e.g., error rate thresholds)
  • Recording rules: Pre-compute expensive queries for faster dashboards

For cloud storage backends (S3, GCS), configure ruler storage accordingly:

loki:
ruler:
enabled: true
storage:
type: s3
s3:
bucketnames: loki-rules
region: us-west-2

Adjust resources based on your cluster size:

prometheus:
server:
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 1000m
memory: 1Gi
grafana:
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi

You can enable only the components you need:

prometheus:
enabled: true
grafana:
enabled: true
loki:
enabled: false
tempo:
enabled: false
alloy:
enabled: false

If you have existing observability infrastructure, disable the subcharts and configure agents to export to your systems:

prometheus:
enabled: false
grafana:
enabled: false
loki:
enabled: false
tempo:
enabled: false

Agent pods include Prometheus scrape annotations by default, so your existing Prometheus can scrape them automatically.