AgentRuntime CRD
The AgentRuntime custom resource defines an AI agent deployment in Kubernetes.
API Version
Section titled “API Version”apiVersion: omnia.altairalabs.ai/v1alpha1kind: AgentRuntimeSpec Fields
Section titled “Spec Fields”promptPackRef
Section titled “promptPackRef”Reference to the PromptPack containing agent prompts.
| Field | Type | Required |
|---|---|---|
promptPackRef.name | string | Yes |
promptPackRef.version | string | No |
promptPackRef.track | string | No (default: “stable”) |
spec: promptPackRef: name: my-prompts version: "1.0.0" # Or use track: "canary"providerRef (Recommended)
Section titled “providerRef (Recommended)”Reference to a Provider resource for LLM configuration. This is the recommended approach as it enables centralized credential management and consistent configuration across agents.
| Field | Type | Required |
|---|---|---|
providerRef.name | string | Yes |
providerRef.namespace | string | No (defaults to same namespace) |
spec: providerRef: name: claude-provider namespace: shared-providers # Optionalprovider (Inline Configuration)
Section titled “provider (Inline Configuration)”Inline provider configuration. Use providerRef instead for production deployments.
| Field | Type | Required |
|---|---|---|
provider.type | string | Yes (claude, openai, gemini, auto) |
provider.model | string | No |
provider.secretRef.name | string | Yes |
provider.secretRef.key | string | No |
provider.defaults.temperature | string | No |
provider.defaults.topP | string | No |
provider.defaults.maxTokens | integer | No |
spec: provider: type: claude model: claude-sonnet-4-20250514 secretRef: name: llm-credentials defaults: temperature: "0.7"The secret should contain the appropriate API key:
apiVersion: v1kind: Secretmetadata: name: llm-credentialsstringData: ANTHROPIC_API_KEY: "sk-ant-..." # For Claude # Or: OPENAI_API_KEY: "sk-..." # For OpenAI # Or: GEMINI_API_KEY: "..." # For GeminiNote: If both
providerRefandproviderare specified,providerReftakes precedence.
facade
Section titled “facade”WebSocket facade configuration.
| Field | Type | Default | Required |
|---|---|---|---|
facade.type | string | websocket | Yes |
facade.port | integer | 8080 | No |
facade.handler | string | runtime | No |
spec: facade: type: websocket port: 8080 handler: runtimeHandler Modes
Section titled “Handler Modes”| Mode | Description | Requires API Key |
|---|---|---|
runtime | Production mode using the runtime framework | Yes |
demo | Demo mode with simulated streaming responses | No |
echo | Simple echo handler for testing connectivity | No |
toolRegistryRef
Section titled “toolRegistryRef”Optional reference to a ToolRegistry resource.
| Field | Type | Required |
|---|---|---|
toolRegistryRef.name | string | No |
toolRegistryRef.namespace | string | No |
spec: toolRegistryRef: name: agent-tools namespace: tools # Optionalsession
Section titled “session”Session storage configuration.
| Field | Type | Default | Required |
|---|---|---|---|
session.type | string | memory | No |
session.ttl | duration | 24h | No |
session.storeRef.name | string | - | No |
spec: session: type: redis ttl: 24h storeRef: name: redis-credentialsSession store types:
memory- In-memory (not recommended for production)redis- Redis backend (recommended)postgres- PostgreSQL backend
runtime
Section titled “runtime”Deployment-related settings including replicas, resources, and autoscaling.
spec: runtime: replicas: 3 resources: requests: cpu: "500m" memory: "256Mi" limits: cpu: "1000m" memory: "512Mi" nodeSelector: node-type: agents tolerations: - key: "dedicated" operator: "Equal" value: "agents" effect: "NoSchedule"runtime.autoscaling
Section titled “runtime.autoscaling”Horizontal pod autoscaling configuration. Supports both standard HPA and KEDA.
| Field | Type | Default | Description |
|---|---|---|---|
enabled | boolean | false | Enable autoscaling |
type | string | hpa | hpa or keda |
minReplicas | integer | 1 | Minimum replicas (0 for KEDA scale-to-zero) |
maxReplicas | integer | 10 | Maximum replicas |
targetMemoryUtilizationPercentage | integer | 70 | Memory target (HPA only) |
targetCPUUtilizationPercentage | integer | 90 | CPU target (HPA only) |
scaleDownStabilizationSeconds | integer | 300 | Scale-down cooldown (HPA only) |
Standard HPA Example
Section titled “Standard HPA Example”spec: runtime: autoscaling: enabled: true type: hpa minReplicas: 2 maxReplicas: 10 targetMemoryUtilizationPercentage: 70 targetCPUUtilizationPercentage: 80 scaleDownStabilizationSeconds: 300KEDA Example
Section titled “KEDA Example”spec: runtime: autoscaling: enabled: true type: keda minReplicas: 1 # Set to 0 for scale-to-zero maxReplicas: 20 keda: pollingInterval: 15 cooldownPeriod: 60 triggers: - type: prometheus metadata: serverAddress: "http://prometheus-server:9090" query: 'sum(omnia_agent_connections_active{agent="my-agent"})' threshold: "10"runtime.autoscaling.keda
Section titled “runtime.autoscaling.keda”KEDA-specific configuration (only used when type: keda).
| Field | Type | Default | Description |
|---|---|---|---|
pollingInterval | integer | 30 | Seconds between trigger checks |
cooldownPeriod | integer | 300 | Seconds before scaling down |
triggers | array | - | Custom KEDA triggers |
If no triggers are specified, a default Prometheus trigger scales based on omnia_agent_connections_active.
KEDA Trigger Types
Section titled “KEDA Trigger Types”Prometheus trigger:
triggers: - type: prometheus metadata: serverAddress: "http://prometheus:9090" query: 'sum(rate(requests_total[1m]))' threshold: "100"Cron trigger:
triggers: - type: cron metadata: timezone: "America/New_York" start: "0 8 * * 1-5" # 8am weekdays end: "0 18 * * 1-5" # 6pm weekdays desiredReplicas: "5"Status Fields
Section titled “Status Fields”| Value | Description |
|---|---|
Pending | Resource created, waiting for dependencies |
Running | Agent pods are running and ready |
Failed | Deployment failed |
replicas
Section titled “replicas”| Field | Description |
|---|---|
status.replicas.desired | Desired replicas |
status.replicas.ready | Ready replicas |
status.replicas.available | Available replicas |
conditions
Section titled “conditions”| Type | Description |
|---|---|
Ready | Overall readiness |
DeploymentReady | Deployment is ready |
ServiceReady | Service is ready |
PromptPackReady | Referenced PromptPack is valid |
ProviderReady | Referenced Provider is valid |
ToolRegistryReady | Referenced ToolRegistry is valid |
Complete Example
Section titled “Complete Example”apiVersion: omnia.altairalabs.ai/v1alpha1kind: AgentRuntimemetadata: name: production-agent namespace: agentsspec: promptPackRef: name: customer-service-prompts version: "2.1.0"
providerRef: name: claude-production
toolRegistryRef: name: service-tools
facade: type: websocket port: 8080 handler: runtime
session: type: redis ttl: 24h storeRef: name: redis-credentials
runtime: replicas: 3 # Ignored when autoscaling enabled resources: requests: cpu: "500m" memory: "256Mi" limits: cpu: "1000m" memory: "512Mi" autoscaling: enabled: true type: keda minReplicas: 1 maxReplicas: 20 keda: pollingInterval: 15 cooldownPeriod: 120 triggers: - type: prometheus metadata: serverAddress: "http://omnia-prometheus-server.omnia-system.svc.cluster.local/prometheus" query: 'sum(omnia_agent_connections_active{agent="production-agent",namespace="agents"}) or vector(0)' threshold: "10"