Skip to content

Set Up Observability

Omnia includes an optional observability stack with Prometheus, Grafana, Loki, and Tempo for comprehensive monitoring of your agent deployments.

  • Kubernetes cluster with Helm 3.x
  • Omnia Helm chart installed

The observability components are disabled by default. Enable them in your Helm values:

prometheus:
enabled: true
grafana:
enabled: true
loki:
enabled: true
tempo:
enabled: true
alloy:
enabled: true

Install or upgrade with these values:

Terminal window
helm upgrade --install omnia oci://ghcr.io/altairalabs/omnia \
--namespace omnia-system \
--create-namespace \
-f values.yaml

For development, port-forward to access Grafana:

Terminal window
kubectl port-forward svc/omnia-grafana 3000:80 -n omnia-system

Open http://localhost:3000 and log in with:

  • Username: admin
  • Password: admin (change this in production)

If you’ve enabled the internal gateway (with Istio), Grafana is available at /grafana:

Terminal window
kubectl get gateway omnia-internal -n omnia-system -o jsonpath='{.status.addresses[0].value}'

Then access http://<gateway-ip>:8080/grafana/

Omnia agents expose Prometheus metrics automatically. Key metrics include:

MetricTypeDescription
omnia_agent_connections_activeGaugeCurrent WebSocket connections
omnia_agent_connections_totalCounterTotal connections since startup
omnia_agent_requests_inflightGaugePending LLM requests
omnia_agent_request_duration_secondsHistogramRequest latency
omnia_agent_messages_received_totalCounterMessages received
omnia_agent_messages_sent_totalCounterMessages sent
  1. Open Grafana and go to Explore
  2. Select the Prometheus datasource
  3. Try these queries:
omnia_agent_connections_active
rate(omnia_agent_requests_total[5m])
histogram_quantile(0.95, rate(omnia_agent_request_duration_seconds_bucket[5m]))

Logs are collected by Alloy and stored in Loki.

  1. Open Grafana and go to Explore
  2. Select the Loki datasource
  3. Use LogQL queries:
{namespace="omnia-system", container="agent"}
{namespace="omnia-system"} |= "error"
{namespace="omnia-system", app_name="my-agent"}

The runtime container supports OpenTelemetry tracing for detailed visibility into conversations, LLM calls, and tool executions.

Tracing is configured via environment variables on the AgentRuntime. The operator will pass these to the runtime container:

apiVersion: omnia.altairalabs.ai/v1alpha1
kind: AgentRuntime
metadata:
name: my-agent
spec:
# ... other config ...
runtime:
env:
- name: OMNIA_TRACING_ENABLED
value: "true"
- name: OMNIA_TRACING_ENDPOINT
value: "tempo.omnia-system.svc.cluster.local:4317"
- name: OMNIA_TRACING_SAMPLE_RATE
value: "1.0"
- name: OMNIA_TRACING_INSECURE
value: "true"
Environment VariableDescriptionDefault
OMNIA_TRACING_ENABLEDEnable OpenTelemetry tracingfalse
OMNIA_TRACING_ENDPOINTOTLP collector endpoint (gRPC)-
OMNIA_TRACING_SAMPLE_RATESampling rate (0.0 to 1.0)1.0
OMNIA_TRACING_INSECUREDisable TLS for OTLP connectionfalse

The runtime creates three types of spans:

Conversation Spans (conversation.turn)

  • Created for each message exchange
  • Includes session ID, message length, response length
  • Parent span for LLM and tool spans

LLM Spans (llm.call)

  • Created for each LLM API call
  • Includes model name, token counts (input/output), cost

Tool Spans (tool.<name>)

  • Created for each tool execution
  • Includes tool name, success/error status, result size

Traces include rich metadata for debugging:

AttributeDescription
omnia.session_idConversation session identifier
llm.modelLLM model used
llm.input_tokensInput token count
llm.output_tokensOutput token count
llm.cost_usdEstimated cost in USD
tool.nameTool that was called
tool.is_errorWhether tool returned an error
tool.result_sizeSize of tool result

Tempo collects distributed traces from agents.

  1. Open Grafana and go to Explore
  2. Select the Tempo datasource
  3. Search by:
    • Service name (e.g., omnia-runtime-my-agent)
    • Trace ID
    • Duration
    • Tags (e.g., omnia.session_id)

Find slow conversations:

{ duration > 5s && resource.service.name =~ "omnia-runtime.*" }

Find tool errors:

{ span.tool.is_error = true }

Enable persistent storage for production:

prometheus:
server:
persistentVolume:
enabled: true
size: 50Gi
loki:
singleBinary:
persistence:
enabled: true
size: 50Gi
tempo:
persistence:
enabled: true
size: 10Gi
grafana:
adminPassword: your-secure-password

Or use a secret:

grafana:
admin:
existingSecret: grafana-admin-secret
userKey: admin-user
passwordKey: admin-password

Adjust resources based on your cluster size:

prometheus:
server:
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 1000m
memory: 1Gi
grafana:
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi

You can enable only the components you need:

prometheus:
enabled: true
grafana:
enabled: true
loki:
enabled: false
tempo:
enabled: false
alloy:
enabled: false

If you have existing observability infrastructure, disable the subcharts and configure agents to export to your systems:

prometheus:
enabled: false
grafana:
enabled: false
loki:
enabled: false
tempo:
enabled: false

Agent pods include Prometheus scrape annotations by default, so your existing Prometheus can scrape them automatically.