Set Up Observability
Omnia includes an optional observability stack with Prometheus, Grafana, Loki, and Tempo for comprehensive monitoring of your agent deployments.
Prerequisites
Section titled “Prerequisites”- Kubernetes cluster with Helm 3.x
- Omnia Helm chart installed
Enable the Observability Stack
Section titled “Enable the Observability Stack”The observability components are disabled by default. Enable them in your Helm values:
prometheus: enabled: true
grafana: enabled: true
loki: enabled: true
tempo: enabled: true
alloy: enabled: trueInstall or upgrade with these values:
helm upgrade --install omnia oci://ghcr.io/altairalabs/omnia \ --namespace omnia-system \ --create-namespace \ -f values.yamlAccess Grafana
Section titled “Access Grafana”Port Forward
Section titled “Port Forward”For development, port-forward to access Grafana:
kubectl port-forward svc/omnia-grafana 3000:80 -n omnia-systemOpen http://localhost:3000 and log in with:
- Username:
admin - Password:
admin(change this in production)
Via Internal Gateway
Section titled “Via Internal Gateway”If you’ve enabled the internal gateway (with Istio), Grafana is available at /grafana:
kubectl get gateway omnia-internal -n omnia-system -o jsonpath='{.status.addresses[0].value}'Then access http://<gateway-ip>:8080/grafana/
View Agent Metrics
Section titled “View Agent Metrics”Omnia agents expose Prometheus metrics automatically on the /metrics endpoint. Metrics are organized into several categories.
Facade Metrics
Section titled “Facade Metrics”Connection and session metrics from the WebSocket facade:
| Metric | Type | Labels | Description |
|---|---|---|---|
omnia_agent_connections_active | Gauge | - | Current WebSocket connections |
omnia_agent_connections_total | Counter | - | Total connections since startup |
omnia_agent_requests_inflight | Gauge | - | Pending LLM requests |
omnia_agent_request_duration_seconds | Histogram | - | Request latency |
omnia_agent_messages_received_total | Counter | - | Messages received |
omnia_agent_messages_sent_total | Counter | - | Messages sent |
LLM Metrics
Section titled “LLM Metrics”Token usage and cost metrics from LLM provider calls (via PromptKit SDK collector):
| Metric | Type | Labels | Description |
|---|---|---|---|
omnia_provider_input_tokens_total | Counter | provider, model | Input tokens sent to LLMs |
omnia_provider_output_tokens_total | Counter | provider, model | Output tokens received |
omnia_provider_requests_total | Counter | provider, model, status | Total LLM requests |
omnia_provider_cost_total | Counter | provider, model | Estimated cost in USD |
omnia_provider_request_duration_seconds | Histogram | provider, model | LLM request duration |
Runtime Metrics
Section titled “Runtime Metrics”Detailed metrics for pipelines, stages, tools, and validations:
Pipeline Metrics:
| Metric | Type | Labels | Description |
|---|---|---|---|
omnia_runtime_pipelines_active | Gauge | - | Currently active pipelines |
omnia_runtime_pipeline_duration_seconds | Histogram | status | Pipeline execution duration |
Stage Metrics:
| Metric | Type | Labels | Description |
|---|---|---|---|
omnia_runtime_stage_elements_total | Counter | stage, status | Total stage executions |
omnia_runtime_stage_duration_seconds | Histogram | stage, stage_type | Stage execution duration |
Tool Metrics:
| Metric | Type | Labels | Description |
|---|---|---|---|
omnia_runtime_tool_calls_total | Counter | tool, status | Total tool invocations |
omnia_runtime_tool_call_duration_seconds | Histogram | tool | Tool execution duration |
Validation Metrics:
| Metric | Type | Labels | Description |
|---|---|---|---|
omnia_runtime_validations_total | Counter | validator, validator_type, status | Total validations |
omnia_runtime_validation_duration_seconds | Histogram | validator, validator_type | Validation duration |
Eval Worker Metrics
Section titled “Eval Worker Metrics”The eval worker exposes metrics for event processing, eval execution, sampling, and result persistence. These are available when enterprise.evalWorker.enabled is true. The eval worker pod includes Prometheus scrape annotations automatically.
Event Processing:
| Metric | Type | Labels | Description |
|---|---|---|---|
omnia_eval_worker_events_received_total | Counter | event_type | Session events consumed from Redis Streams |
omnia_eval_worker_event_processing_duration_seconds | Histogram | event_type | End-to-end time to process a stream event |
omnia_eval_worker_stream_lag | Gauge | stream | Pending messages per Redis stream (consumer lag) |
Eval Execution:
| Metric | Type | Labels | Description |
|---|---|---|---|
omnia_eval_worker_evals_executed_total | Counter | eval_type, trigger, status | Total eval executions (success/error) |
omnia_eval_worker_eval_duration_seconds | Histogram | eval_type, trigger | Eval execution duration |
omnia_eval_worker_evals_sampled_total | Counter | eval_type, decision | Sampling decisions (sampled vs skipped) |
omnia_eval_worker_results_written_total | Counter | status | Eval results written to session-api |
Session API Metrics
Section titled “Session API Metrics”The session-api exposes HTTP request metrics and event publishing metrics. Prometheus scrape annotations are included on the deployment by default.
HTTP Requests:
| Metric | Type | Labels | Description |
|---|---|---|---|
omnia_session_api_requests_total | Counter | method, route, status_code | Total HTTP requests |
omnia_session_api_request_duration_seconds | Histogram | method, route, status_code | HTTP request duration |
Event Publishing (requires Redis):
| Metric | Type | Labels | Description |
|---|---|---|---|
omnia_session_api_events_published_total | Counter | status | Redis stream publish attempts (success/error) |
omnia_session_api_event_publish_duration_seconds | Histogram | — | Time to publish an event to Redis Streams |
Query Metrics in Grafana
Section titled “Query Metrics in Grafana”- Open Grafana and go to Explore
- Select the Prometheus datasource
- Try these queries:
# Active connectionsomnia_agent_connections_active
# Request raterate(omnia_agent_requests_total[5m])
# P95 request latencyhistogram_quantile(0.95, rate(omnia_agent_request_duration_seconds_bucket[5m]))
# LLM cost per model (last hour)sum by (model) (increase(omnia_provider_cost_total[1h]))
# Token usage rate by providersum by (provider) (rate(omnia_provider_input_tokens_total[5m]) + rate(omnia_provider_output_tokens_total[5m]))
# Tool call error ratesum(rate(omnia_runtime_tool_calls_total{status="error"}[5m])) / sum(rate(omnia_runtime_tool_calls_total[5m]))
# Average pipeline durationhistogram_quantile(0.5, rate(omnia_runtime_pipeline_duration_seconds_bucket[5m]))
# Eval worker: evals per second by typesum by (eval_type) (rate(omnia_eval_worker_evals_executed_total[5m]))
# Eval worker: eval pass/fail ratesum(rate(omnia_eval_worker_evals_executed_total{status="error"}[5m])) / sum(rate(omnia_eval_worker_evals_executed_total[5m]))
# Eval worker: consumer lag across all streamsomnia_eval_worker_stream_lag
# Eval worker: P95 eval duration by typehistogram_quantile(0.95, rate(omnia_eval_worker_eval_duration_seconds_bucket[5m]))
# Session API: request rate by routesum by (route) (rate(omnia_session_api_requests_total[5m]))
# Session API: P99 request latencyhistogram_quantile(0.99, rate(omnia_session_api_request_duration_seconds_bucket[5m]))
# Session API: event publish error raterate(omnia_session_api_events_published_total{status="error"}[5m])View Agent Logs
Section titled “View Agent Logs”Logs are collected by Alloy and stored in Loki.
Query Logs in Grafana
Section titled “Query Logs in Grafana”- Open Grafana and go to Explore
- Select the Loki datasource
- Use LogQL queries:
{namespace="omnia-system", container="agent"}
{namespace="omnia-system"} |= "error"
{namespace="omnia-system", app_name="my-agent"}Agent Tracing with OpenTelemetry
Section titled “Agent Tracing with OpenTelemetry”The runtime container supports OpenTelemetry tracing for detailed visibility into conversations, LLM calls, and tool executions.
Enable Tracing
Section titled “Enable Tracing”Tracing is configured via environment variables on the AgentRuntime. The operator will pass these to the runtime container:
apiVersion: omnia.altairalabs.ai/v1alpha1kind: AgentRuntimemetadata: name: my-agentspec: # ... other config ... runtime: env: - name: OMNIA_TRACING_ENABLED value: "true" - name: OMNIA_TRACING_ENDPOINT value: "tempo.omnia-system.svc.cluster.local:4317" - name: OMNIA_TRACING_SAMPLE_RATE value: "1.0" - name: OMNIA_TRACING_INSECURE value: "true"Tracing Configuration Options
Section titled “Tracing Configuration Options”| Environment Variable | Description | Default |
|---|---|---|
OMNIA_TRACING_ENABLED | Enable OpenTelemetry tracing | false |
OMNIA_TRACING_ENDPOINT | OTLP collector endpoint (gRPC) | - |
OMNIA_TRACING_SAMPLE_RATE | Sampling rate (0.0 to 1.0) | 1.0 |
OMNIA_TRACING_INSECURE | Disable TLS for OTLP connection | false |
Span Types
Section titled “Span Types”The runtime creates three types of spans:
Conversation Spans (conversation.turn)
- Created for each message exchange
- Includes session ID, message length, response length
- Parent span for LLM and tool spans
LLM Spans (llm.call)
- Created for each LLM API call
- Includes model name, token counts (input/output), cost
Tool Spans (tool.<name>)
- Created for each tool execution
- Includes tool name, success/error status, result size
Trace Attributes
Section titled “Trace Attributes”Traces include rich metadata for debugging:
| Attribute | Description |
|---|---|
omnia.session_id | Conversation session identifier |
llm.model | LLM model used |
llm.input_tokens | Input token count |
llm.output_tokens | Output token count |
llm.cost_usd | Estimated cost in USD |
tool.name | Tool that was called |
tool.is_error | Whether tool returned an error |
tool.result_size | Size of tool result |
View Traces in Tempo
Section titled “View Traces in Tempo”Tempo collects distributed traces from agents.
Query Traces in Grafana
Section titled “Query Traces in Grafana”- Open Grafana and go to Explore
- Select the Tempo datasource
- Search by:
- Service name (e.g.,
omnia-runtime-my-agent) - Trace ID
- Duration
- Tags (e.g.,
omnia.session_id)
- Service name (e.g.,
Example Trace Query
Section titled “Example Trace Query”Find slow conversations:
{ duration > 5s && resource.service.name =~ "omnia-runtime.*" }Find tool errors:
{ span.tool.is_error = true }Production Considerations
Section titled “Production Considerations”Persistent Storage
Section titled “Persistent Storage”Enable persistent storage for production:
prometheus: server: persistentVolume: enabled: true size: 50Gi
loki: singleBinary: persistence: enabled: true size: 50Gi
tempo: persistence: enabled: true size: 10GiChange Grafana Password
Section titled “Change Grafana Password”grafana: adminPassword: your-secure-passwordOr use a secret:
grafana: admin: existingSecret: grafana-admin-secret userKey: admin-user passwordKey: admin-passwordEnable Loki Ruler (Log-based Alerting)
Section titled “Enable Loki Ruler (Log-based Alerting)”The Loki ruler is disabled by default to avoid startup issues on local development environments. For production deployments that need log-based alerting, enable it:
loki: ruler: enabled: true storage: type: local local: directory: /var/loki/rules alertmanager_url: http://alertmanager:9093 singleBinary: extraVolumes: - name: rules emptyDir: {} extraVolumeMounts: - name: rules mountPath: /var/loki/rulesThe ruler allows you to:
- Alerting rules: Fire alerts based on LogQL queries (e.g., error rate thresholds)
- Recording rules: Pre-compute expensive queries for faster dashboards
For cloud storage backends (S3, GCS), configure ruler storage accordingly:
loki: ruler: enabled: true storage: type: s3 s3: bucketnames: loki-rules region: us-west-2Resource Limits
Section titled “Resource Limits”Adjust resources based on your cluster size:
prometheus: server: resources: requests: cpu: 500m memory: 512Mi limits: cpu: 1000m memory: 1Gi
grafana: resources: requests: cpu: 100m memory: 128Mi limits: cpu: 500m memory: 256MiDisable Individual Components
Section titled “Disable Individual Components”You can enable only the components you need:
prometheus: enabled: truegrafana: enabled: trueloki: enabled: falsetempo: enabled: falsealloy: enabled: falseUse External Observability
Section titled “Use External Observability”If you have existing observability infrastructure, disable the subcharts and configure agents to export to your systems:
prometheus: enabled: falsegrafana: enabled: falseloki: enabled: falsetempo: enabled: falseAgent pods include Prometheus scrape annotations by default, so your existing Prometheus can scrape them automatically.