Skip to content

AgentRuntime CRD

The AgentRuntime custom resource defines an AI agent deployment in Kubernetes.

apiVersion: omnia.altairalabs.ai/v1alpha1
kind: AgentRuntime

Reference to the PromptPack containing agent prompts.

FieldTypeRequired
promptPackRef.namestringYes
promptPackRef.versionstringNo
spec:
promptPackRef:
name: my-prompts
version: "1.0.0"

A list of named provider references. Each entry maps a logical name to a Provider CRD. This enables centralized credential management, consistent configuration across agents, and explicit judge provider mapping for evals.

FieldTypeRequired
providers[].namestringYes
providers[].providerRef.namestringYes
providers[].providerRef.namespacestringNo (defaults to same namespace)

The name field is a logical identifier used to look up providers by role:

NamePurpose
defaultPrimary LLM provider for the runtime
judgeLLM judge for eval execution
Any custom nameReferenced by name in PromptPack eval definitions
spec:
providers:
- name: default
providerRef:
name: claude-sonnet
- name: judge
providerRef:
name: claude-haiku
namespace: shared-providers # Optional cross-namespace reference

See the Provider reference for details on configuring Provider CRDs (types, secrets, defaults, etc.).

Agent framework configuration. Specifies which runtime framework the agent uses.

FieldTypeDefaultRequired
framework.typestringpromptkitNo
framework.versionstring-No
framework.imagestring-No
spec:
framework:
type: promptkit
version: "1.0.0" # Optional version pinning
image: myregistry.io/omnia-runtime:v1.0.0 # Optional image override
TypeDescription
promptkitDefault framework using PromptKit (recommended)
customCustom framework (requires image field)

The framework.image field allows you to override the default runtime container image. This is:

  • Required when using type: custom
  • Optional for built-in frameworks when you need a private registry or custom build

WebSocket facade configuration.

FieldTypeDefaultRequired
facade.typestringwebsocketYes
facade.portinteger8080No
facade.handlerstringruntimeNo
facade.imagestring-No
spec:
facade:
type: websocket
port: 8080
handler: runtime
image: myregistry.io/omnia-facade:v1.0.0 # Optional override
ModeDescriptionRequires API Key
runtimeProduction mode using the runtime frameworkYes
demoDemo mode with simulated streaming responsesNo
echoSimple echo handler for testing connectivityNo

The facade.image field allows you to override the default facade container image. Use this when:

  • Using a private container registry
  • Running a custom build of the facade
  • Pinning to a specific version different from the operator default

Optional media storage configuration for the facade. When enabled, clients can upload files via HTTP endpoints before referencing them in WebSocket messages.

FieldTypeDefaultRequired
facade.media.enabledbooleanfalseNo
facade.media.storagePathstring/var/omnia/mediaNo
facade.media.publicURLstring-Yes (if enabled)
facade.media.maxFileSizestring10MiNo
facade.media.defaultTTLduration24hNo
spec:
facade:
type: websocket
port: 8080
media:
enabled: true
storagePath: /var/omnia/media
publicURL: https://agent.example.com
maxFileSize: 10Mi
defaultTTL: 24h

Facade media storage is useful when:

  • Using a custom runtime without built-in media externalization
  • Need a runtime-agnostic upload endpoint
  • Want to avoid base64-encoding large files in WebSocket messages

Note: Runtimes like PromptKit have built-in media externalization, so facade media storage can remain disabled (the default).

The facade media configuration is passed to the container via environment variables:

VariableDescription
OMNIA_MEDIA_STORAGE_TYPEnone (disabled) or local (enabled)
OMNIA_MEDIA_STORAGE_PATHDirectory for storing uploaded files
OMNIA_MEDIA_PUBLIC_URLBase URL for generating download URLs
OMNIA_MEDIA_MAX_FILE_SIZEMaximum upload size in bytes
OMNIA_MEDIA_DEFAULT_TTLDefault time-to-live for uploads

Optional reference to a ToolRegistry resource.

FieldTypeRequired
toolRegistryRef.namestringNo
toolRegistryRef.namespacestringNo
spec:
toolRegistryRef:
name: agent-tools
namespace: tools # Optional

Session storage configuration.

FieldTypeDefaultRequired
session.typestringmemoryNo
session.ttlduration24hNo
session.storeRef.namestring-No
spec:
session:
type: redis
ttl: 24h
storeRef:
name: redis-credentials

Session store types:

  • memory - In-memory (not recommended for production)
  • redis - Redis backend (recommended)
  • postgres - PostgreSQL backend

Media configuration for resolving mock:// URLs in mock provider responses.

FieldTypeDefaultRequired
media.basePathstring/etc/omnia/mediaNo
spec:
media:
basePath: /etc/omnia/media

The basePath sets the OMNIA_MEDIA_BASE_PATH environment variable, which the runtime uses to resolve mock:// URLs to actual file paths. This is primarily used with the mock provider for testing multimodal responses.

Deployment-related settings including replicas, resources, and autoscaling.

spec:
runtime:
replicas: 3
resources:
requests:
cpu: "500m"
memory: "256Mi"
limits:
cpu: "1000m"
memory: "512Mi"
nodeSelector:
node-type: agents
tolerations:
- key: "dedicated"
operator: "Equal"
value: "agents"
effect: "NoSchedule"

Mount additional volumes in the runtime container for media files, mock configurations, or other data.

FieldTypeDescription
runtime.volumes[]VolumeKubernetes Volume definitions
runtime.volumeMounts[]VolumeMountVolume mounts for the runtime container
spec:
runtime:
volumes:
- name: mock-media
persistentVolumeClaim:
claimName: media-pvc
- name: mock-config
configMap:
name: mock-responses
volumeMounts:
- name: mock-media
mountPath: /etc/omnia/media
readOnly: true
- name: mock-config
mountPath: /etc/omnia/mock
readOnly: true

Supported volume types include:

  • persistentVolumeClaim - Mount a PVC for persistent storage
  • configMap - Mount a ConfigMap as files
  • secret - Mount a Secret as files
  • emptyDir - Temporary storage (cleared on pod restart)

This is commonly used with the mock provider to mount media files (images, audio) and mock response configurations for testing.

Horizontal pod autoscaling configuration. Supports both standard HPA and KEDA.

FieldTypeDefaultDescription
enabledbooleanfalseEnable autoscaling
typestringhpahpa or keda
minReplicasinteger1Minimum replicas (0 for KEDA scale-to-zero)
maxReplicasinteger10Maximum replicas
targetMemoryUtilizationPercentageinteger70Memory target (HPA only)
targetCPUUtilizationPercentageinteger90CPU target (HPA only)
scaleDownStabilizationSecondsinteger300Scale-down cooldown (HPA only)
spec:
runtime:
autoscaling:
enabled: true
type: hpa
minReplicas: 2
maxReplicas: 10
targetMemoryUtilizationPercentage: 70
targetCPUUtilizationPercentage: 80
scaleDownStabilizationSeconds: 300
spec:
runtime:
autoscaling:
enabled: true
type: keda
minReplicas: 1 # Set to 0 for scale-to-zero
maxReplicas: 20
keda:
pollingInterval: 15
cooldownPeriod: 60
triggers:
- type: prometheus
metadata:
serverAddress: "http://prometheus-server:9090"
query: 'sum(omnia_agent_connections_active{agent="my-agent"})'
threshold: "10"

KEDA-specific configuration (only used when type: keda).

FieldTypeDefaultDescription
pollingIntervalinteger30Seconds between trigger checks
cooldownPeriodinteger300Seconds before scaling down
triggersarray-Custom KEDA triggers

If no triggers are specified, a default Prometheus trigger scales based on omnia_agent_connections_active.

Prometheus trigger:

triggers:
- type: prometheus
metadata:
serverAddress: "http://prometheus:9090"
query: 'sum(rate(requests_total[1m]))'
threshold: "100"

Cron trigger:

triggers:
- type: cron
metadata:
timezone: "America/New_York"
start: "0 8 * * 1-5" # 8am weekdays
end: "0 18 * * 1-5" # 6pm weekdays
desiredReplicas: "5"

Configures realtime eval execution for this agent. When enabled, session events trigger evaluation of live conversations against eval definitions in the referenced PromptPack. See Realtime Evals for the full architecture and Configure Realtime Evals for a step-by-step guide.

FieldTypeDefaultRequired
evals.enabledbooleanfalseNo
spec:
evals:
enabled: true

LLM judge evals resolve their provider from the AgentRuntime’s spec.providers list. Add a provider named "judge" (or any custom name referenced in your PromptPack eval definitions):

spec:
providers:
- name: default
providerRef:
name: claude-sonnet # Primary LLM for the agent
- name: judge
providerRef:
name: claude-haiku # Cheap/fast model for eval judging

The eval worker resolves provider credentials from the referenced Provider CRDs and their associated Secrets.

Controls what percentage of sessions and turns are evaluated to manage cost.

FieldTypeDefaultDescription
evals.sampling.defaultRateinteger (0-100)100Sampling percentage for lightweight (in-process) evals
evals.sampling.extendedRateinteger (0-100)10Sampling percentage for extended (model-powered) evals
spec:
evals:
sampling:
defaultRate: 100 # Run all lightweight evals
extendedRate: 10 # Sample 10% for extended evals (cost control)

Sampling uses deterministic hashing on sessionID:turnIndex, so the same session/turn always produces the same sampling decision. Lightweight evals (e.g., content_includes) are fast and free to run, using defaultRate. Extended evals (model-powered evaluations) incur API costs and latency, so extendedRate is set lower by default.

Limits eval execution throughput to prevent runaway costs.

FieldTypeDefaultDescription
evals.rateLimit.maxEvalsPerSecondinteger50Maximum evals executed per second
evals.rateLimit.maxConcurrentJudgeCallsinteger5Maximum concurrent LLM judge API calls
spec:
evals:
rateLimit:
maxEvalsPerSecond: 50
maxConcurrentJudgeCalls: 5

Configures how session completion is detected for on_session_complete evals.

FieldTypeDefaultDescription
evals.sessionCompletion.inactivityTimeoutduration string”5m”Duration after last message before a session is considered complete
spec:
evals:
sessionCompletion:
inactivityTimeout: 10m

Progressive rollout configuration. When rollout.candidate is set and differs from the current spec, the controller creates a candidate Deployment and progresses through the defined steps.

FieldTypeRequiredDescription
rollout.candidateobjectNoOverrides for the candidate version
rollout.candidate.promptPackVersionstringNoPromptPack version for the candidate
rollout.candidate.providerRefsarrayNoProvider overrides for the candidate
rollout.candidate.toolRegistryRefobjectNoToolRegistry override for the candidate
rollout.stepsarrayYesOrdered sequence of rollout actions
rollout.steps[].setWeightintegerSet candidate traffic weight (0-100)
rollout.steps[].pauseobjectPause the rollout
rollout.steps[].pause.durationstringNoPause duration (e.g., “5m”). Omit for indefinite
rollout.steps[].analysisobjectRun a RolloutAnalysis template
rollout.steps[].analysis.templateNamestringYesName of the RolloutAnalysis CRD
rollout.steps[].analysis.argsarrayNoArgument overrides for the template
rollout.stickySessionobjectNoConsistent routing for experiments
rollout.stickySession.hashOnstringYesHeader for consistent hashing (e.g., “x-user-id”)
rollout.rollbackobjectNoRollback configuration
rollout.rollback.modestringNoautomatic, manual (default), or disabled
rollout.rollback.cooldownstringNoDebounce duration (default: “5m”)
rollout.trafficRoutingobjectNoTraffic management provider
rollout.trafficRouting.istio.virtualService.namestringYesVirtualService to patch
rollout.trafficRouting.istio.virtualService.routesarrayYesRoute names to manage
rollout.trafficRouting.istio.destinationRule.namestringYesDestinationRule to patch
# Canary rollout with analysis
spec:
promptPackRef:
name: customer-support-pack
version: "1.0.0"
rollout:
candidate:
promptPackVersion: "2.0.0"
steps:
- setWeight: 10
- pause:
duration: "5m"
- analysis:
templateName: quality-check
- setWeight: 50
- pause:
duration: "10m"
- setWeight: 100
rollback:
mode: automatic
trafficRouting:
istio:
virtualService:
name: customer-support-vs
routes: [primary]
destinationRule:
name: customer-support-dr

When candidate matches the current spec, the rollout is idle. Promotion copies candidate overrides into the main spec. Rollback reverts the candidate to match the current spec.

ValueDescription
PendingResource created, waiting for dependencies
RunningAgent pods are running and ready
FailedDeployment failed
FieldDescription
status.replicas.desiredDesired replicas
status.replicas.readyReady replicas
status.replicas.availableAvailable replicas
FieldDescription
status.rollout.activeWhether a rollout is in progress
status.rollout.currentStepCurrent step index
status.rollout.currentWeightCurrent candidate traffic weight
status.rollout.stableVersionVersion serving stable traffic
status.rollout.candidateVersionVersion serving candidate traffic
TypeDescription
ReadyOverall readiness
DeploymentReadyDeployment is ready
ServiceReadyService is ready
PromptPackReadyReferenced PromptPack is valid
ProviderReadyReferenced Provider is valid
ToolRegistryReadyReferenced ToolRegistry is valid
apiVersion: omnia.altairalabs.ai/v1alpha1
kind: AgentRuntime
metadata:
name: production-agent
namespace: agents
spec:
promptPackRef:
name: customer-service-prompts
version: "2.1.0"
providers:
- name: default
providerRef:
name: claude-production
- name: judge
providerRef:
name: claude-haiku
toolRegistryRef:
name: service-tools
facade:
type: websocket
port: 8080
handler: runtime
session:
type: redis
ttl: 24h
storeRef:
name: redis-credentials
evals:
enabled: true
sampling:
defaultRate: 100
extendedRate: 10
rateLimit:
maxEvalsPerSecond: 50
maxConcurrentJudgeCalls: 5
sessionCompletion:
inactivityTimeout: 5m
runtime:
replicas: 3 # Ignored when autoscaling enabled
resources:
requests:
cpu: "500m"
memory: "256Mi"
limits:
cpu: "1000m"
memory: "512Mi"
autoscaling:
enabled: true
type: keda
minReplicas: 1
maxReplicas: 20
keda:
pollingInterval: 15
cooldownPeriod: 120
triggers:
- type: prometheus
metadata:
serverAddress: "http://omnia-prometheus-server.omnia-system.svc.cluster.local/prometheus"
query: 'sum(omnia_agent_connections_active{agent="production-agent",namespace="agents"}) or vector(0)'
threshold: "10"