Skip to content

AgentRuntime CRD

The AgentRuntime custom resource defines an AI agent deployment in Kubernetes.

apiVersion: omnia.altairalabs.ai/v1alpha1
kind: AgentRuntime

Reference to the PromptPack containing agent prompts.

FieldTypeRequired
promptPackRef.namestringYes
promptPackRef.versionstringNo
promptPackRef.trackstringNo (default: “stable”)
spec:
promptPackRef:
name: my-prompts
version: "1.0.0" # Or use track: "canary"

Reference to a Provider resource for LLM configuration. This is the recommended approach as it enables centralized credential management and consistent configuration across agents.

FieldTypeRequired
providerRef.namestringYes
providerRef.namespacestringNo (defaults to same namespace)
spec:
providerRef:
name: claude-provider
namespace: shared-providers # Optional

Inline provider configuration. Use providerRef instead for production deployments.

FieldTypeRequired
provider.typestringYes (claude, openai, gemini, auto)
provider.modelstringNo
provider.secretRef.namestringYes
provider.secretRef.keystringNo
provider.defaults.temperaturestringNo
provider.defaults.topPstringNo
provider.defaults.maxTokensintegerNo
spec:
provider:
type: claude
model: claude-sonnet-4-20250514
secretRef:
name: llm-credentials
defaults:
temperature: "0.7"

The secret should contain the appropriate API key:

apiVersion: v1
kind: Secret
metadata:
name: llm-credentials
stringData:
ANTHROPIC_API_KEY: "sk-ant-..." # For Claude
# Or: OPENAI_API_KEY: "sk-..." # For OpenAI
# Or: GEMINI_API_KEY: "..." # For Gemini

Note: If both providerRef and provider are specified, providerRef takes precedence.

WebSocket facade configuration.

FieldTypeDefaultRequired
facade.typestringwebsocketYes
facade.portinteger8080No
facade.handlerstringruntimeNo
spec:
facade:
type: websocket
port: 8080
handler: runtime
ModeDescriptionRequires API Key
runtimeProduction mode using the runtime frameworkYes
demoDemo mode with simulated streaming responsesNo
echoSimple echo handler for testing connectivityNo

Optional reference to a ToolRegistry resource.

FieldTypeRequired
toolRegistryRef.namestringNo
toolRegistryRef.namespacestringNo
spec:
toolRegistryRef:
name: agent-tools
namespace: tools # Optional

Session storage configuration.

FieldTypeDefaultRequired
session.typestringmemoryNo
session.ttlduration24hNo
session.storeRef.namestring-No
spec:
session:
type: redis
ttl: 24h
storeRef:
name: redis-credentials

Session store types:

  • memory - In-memory (not recommended for production)
  • redis - Redis backend (recommended)
  • postgres - PostgreSQL backend

Deployment-related settings including replicas, resources, and autoscaling.

spec:
runtime:
replicas: 3
resources:
requests:
cpu: "500m"
memory: "256Mi"
limits:
cpu: "1000m"
memory: "512Mi"
nodeSelector:
node-type: agents
tolerations:
- key: "dedicated"
operator: "Equal"
value: "agents"
effect: "NoSchedule"

Horizontal pod autoscaling configuration. Supports both standard HPA and KEDA.

FieldTypeDefaultDescription
enabledbooleanfalseEnable autoscaling
typestringhpahpa or keda
minReplicasinteger1Minimum replicas (0 for KEDA scale-to-zero)
maxReplicasinteger10Maximum replicas
targetMemoryUtilizationPercentageinteger70Memory target (HPA only)
targetCPUUtilizationPercentageinteger90CPU target (HPA only)
scaleDownStabilizationSecondsinteger300Scale-down cooldown (HPA only)
spec:
runtime:
autoscaling:
enabled: true
type: hpa
minReplicas: 2
maxReplicas: 10
targetMemoryUtilizationPercentage: 70
targetCPUUtilizationPercentage: 80
scaleDownStabilizationSeconds: 300
spec:
runtime:
autoscaling:
enabled: true
type: keda
minReplicas: 1 # Set to 0 for scale-to-zero
maxReplicas: 20
keda:
pollingInterval: 15
cooldownPeriod: 60
triggers:
- type: prometheus
metadata:
serverAddress: "http://prometheus-server:9090"
query: 'sum(omnia_agent_connections_active{agent="my-agent"})'
threshold: "10"

KEDA-specific configuration (only used when type: keda).

FieldTypeDefaultDescription
pollingIntervalinteger30Seconds between trigger checks
cooldownPeriodinteger300Seconds before scaling down
triggersarray-Custom KEDA triggers

If no triggers are specified, a default Prometheus trigger scales based on omnia_agent_connections_active.

Prometheus trigger:

triggers:
- type: prometheus
metadata:
serverAddress: "http://prometheus:9090"
query: 'sum(rate(requests_total[1m]))'
threshold: "100"

Cron trigger:

triggers:
- type: cron
metadata:
timezone: "America/New_York"
start: "0 8 * * 1-5" # 8am weekdays
end: "0 18 * * 1-5" # 6pm weekdays
desiredReplicas: "5"
ValueDescription
PendingResource created, waiting for dependencies
RunningAgent pods are running and ready
FailedDeployment failed
FieldDescription
status.replicas.desiredDesired replicas
status.replicas.readyReady replicas
status.replicas.availableAvailable replicas
TypeDescription
ReadyOverall readiness
DeploymentReadyDeployment is ready
ServiceReadyService is ready
PromptPackReadyReferenced PromptPack is valid
ProviderReadyReferenced Provider is valid
ToolRegistryReadyReferenced ToolRegistry is valid
apiVersion: omnia.altairalabs.ai/v1alpha1
kind: AgentRuntime
metadata:
name: production-agent
namespace: agents
spec:
promptPackRef:
name: customer-service-prompts
version: "2.1.0"
providerRef:
name: claude-production
toolRegistryRef:
name: service-tools
facade:
type: websocket
port: 8080
handler: runtime
session:
type: redis
ttl: 24h
storeRef:
name: redis-credentials
runtime:
replicas: 3 # Ignored when autoscaling enabled
resources:
requests:
cpu: "500m"
memory: "256Mi"
limits:
cpu: "1000m"
memory: "512Mi"
autoscaling:
enabled: true
type: keda
minReplicas: 1
maxReplicas: 20
keda:
pollingInterval: 15
cooldownPeriod: 120
triggers:
- type: prometheus
metadata:
serverAddress: "http://omnia-prometheus-server.omnia-system.svc.cluster.local/prometheus"
query: 'sum(omnia_agent_connections_active{agent="production-agent",namespace="agents"}) or vector(0)'
threshold: "10"