Scale Agent Deployments
This guide covers scaling strategies for Omnia agent deployments.
Manual Scaling
Section titled “Manual Scaling”Set Replicas
Section titled “Set Replicas”Scale by adjusting the runtime.replicas field:
apiVersion: omnia.altairalabs.ai/v1alpha1kind: AgentRuntimemetadata: name: my-agentspec: runtime: replicas: 3 # ...Or use kubectl:
kubectl patch agentruntime my-agent --type=merge \ -p '{"spec":{"runtime":{"replicas":5}}}'Automatic Scaling with HPA
Section titled “Automatic Scaling with HPA”Enable built-in HPA autoscaling:
spec: runtime: autoscaling: enabled: true type: hpa minReplicas: 2 maxReplicas: 10 targetMemoryUtilizationPercentage: 70 targetCPUUtilizationPercentage: 90The HPA automatically adjusts replicas based on resource utilization.
Check HPA Status
Section titled “Check HPA Status”kubectl get hpakubectl describe hpa my-agentAdvanced Scaling with KEDA
Section titled “Advanced Scaling with KEDA”For custom metrics and scale-to-zero capabilities, use KEDA:
spec: runtime: autoscaling: enabled: true type: keda minReplicas: 1 maxReplicas: 20 keda: pollingInterval: 30 cooldownPeriod: 300 triggers: - type: prometheus metadata: serverAddress: "http://prometheus:9090" query: 'sum(omnia_agent_connections_active{agent="my-agent"})' threshold: "10"See Autoscaling Explained for detailed KEDA configuration.
Resource Configuration
Section titled “Resource Configuration”Set Resource Limits
Section titled “Set Resource Limits”Configure CPU and memory for predictable performance:
spec: runtime: resources: requests: cpu: "500m" memory: "256Mi" limits: cpu: "1000m" memory: "512Mi"Resource Guidelines
Section titled “Resource Guidelines”| Workload | CPU Request | Memory Request |
|---|---|---|
| Light | 250m | 128Mi |
| Medium | 500m | 256Mi |
| Heavy | 1000m | 512Mi |
Session Affinity
Section titled “Session Affinity”When using multiple replicas, ensure session affinity:
With Redis Sessions (Recommended)
Section titled “With Redis Sessions (Recommended)”Redis-backed sessions work seamlessly with any replica:
spec: session: type: redis storeRef: name: redis-credentialsWith Memory Sessions
Section titled “With Memory Sessions”If using memory sessions (not recommended for production), configure service affinity:
apiVersion: v1kind: Servicemetadata: name: my-agentspec: sessionAffinity: ClientIP sessionAffinityConfig: clientIP: timeoutSeconds: 3600Monitoring Scale
Section titled “Monitoring Scale”Check replica status:
kubectl get agentruntime my-agent -o wideView status conditions:
kubectl describe agentruntime my-agentView autoscaling metrics:
kubectl get hpa my-agent
kubectl get scaledobject my-agentkubectl get hpa keda-hpa-my-agentNext Steps
Section titled “Next Steps”- Autoscaling Explained - Deep dive into HPA vs KEDA
- Set Up Observability - Monitor scaling metrics
- AgentRuntime Reference - Full autoscaling configuration