Scale Agent Deployments

This guide covers scaling strategies for Omnia agent deployments.

Manual Scaling

Set Replicas

Scale by adjusting the runtime.replicas field:

apiVersion: omnia.altairalabs.ai/v1alpha1
kind: AgentRuntime
metadata:
  name: my-agent
spec:
  runtime:
    replicas: 3
  # ...

Or use kubectl:

kubectl patch agentruntime my-agent --type=merge \
  -p '{"spec":{"runtime":{"replicas":5}}}'

Automatic Scaling with HPA

Enable built-in HPA autoscaling:

spec:
  runtime:
    autoscaling:
      enabled: true
      type: hpa
      minReplicas: 2
      maxReplicas: 10
      targetMemoryUtilizationPercentage: 70
      targetCPUUtilizationPercentage: 90

The HPA automatically adjusts replicas based on resource utilization.

Check HPA Status

kubectl get hpa
kubectl describe hpa my-agent

Advanced Scaling with KEDA

For custom metrics and scale-to-zero capabilities, use KEDA:

spec:
  runtime:
    autoscaling:
      enabled: true
      type: keda
      minReplicas: 1
      maxReplicas: 20
      keda:
        pollingInterval: 30
        cooldownPeriod: 300
        triggers:
          - type: prometheus
            metadata:
              serverAddress: "http://prometheus:9090"
              query: 'sum(omnia_agent_connections_active{agent="my-agent"})'
              threshold: "10"

See Autoscaling Explained for detailed KEDA configuration.

Resource Configuration

Set Resource Limits

Configure CPU and memory for predictable performance:

spec:
  runtime:
    resources:
      requests:
        cpu: "500m"
        memory: "256Mi"
      limits:
        cpu: "1000m"
        memory: "512Mi"

Resource Guidelines

Workload	CPU Request	Memory Request
Light	250m	128Mi
Medium	500m	256Mi
Heavy	1000m	512Mi

Session Affinity

When using multiple replicas, ensure session affinity:

With Redis Sessions (Recommended)

Redis-backed sessions work seamlessly with any replica:

spec:
  session:
    type: redis
    storeRef:
      name: redis-credentials

With Memory Sessions

If using memory sessions (not recommended for production), configure service affinity:

apiVersion: v1
kind: Service
metadata:
  name: my-agent
spec:
  sessionAffinity: ClientIP
  sessionAffinityConfig:
    clientIP:
      timeoutSeconds: 3600

Monitoring Scale

Check replica status:

kubectl get agentruntime my-agent -o wide

View status conditions:

kubectl describe agentruntime my-agent

View autoscaling metrics:

kubectl get hpa my-agent

kubectl get scaledobject my-agent
kubectl get hpa keda-hpa-my-agent

Next Steps

Autoscaling Explained - Deep dive into HPA vs KEDA
Set Up Observability - Monitor scaling metrics
AgentRuntime Reference - Full autoscaling configuration