Controller Reconciliation
This document explains the reconciliation logic used by Omnia’s Kubernetes controllers.
Reconciliation Pattern
Section titled “Reconciliation Pattern”Omnia follows the Kubernetes controller reconciliation pattern:
- Watch - Monitor resources for changes
- Queue - Add changed resources to work queue
- Reconcile - Process each resource to achieve desired state
- Requeue - Retry on transient failures
AgentRuntime Controller
Section titled “AgentRuntime Controller”Reconciliation Flow
Section titled “Reconciliation Flow”flowchart TD A[AgentRuntime Change Detected] --> B{Validate Spec} B -->|Valid| C[Fetch PromptPack] B -->|Invalid| F1[Set Failed Status] C -->|Found| D{ToolRegistry Referenced?} C -->|Not Found| F1 D -->|Yes| E[Fetch ToolRegistry] D -->|No| G[Build Deployment Spec] E -->|Found| G E -->|Not Found| F1 G --> H[Create/Update Deployment] H --> I[Create/Update Service] I --> J[Update Status: Running]Deployment Building
Section titled “Deployment Building”The controller builds the Deployment spec with:
- Container image - From operator configuration
- Environment variables:
OMNIA_AGENT_NAME- AgentRuntime nameOMNIA_NAMESPACE- NamespaceOMNIA_PROVIDER_*- Provider configurationOMNIA_SESSION_*- Session configuration
- Volume mounts - PromptPack ConfigMap
- Resource limits - From spec
- Labels - For identification and selection
Status Updates
Section titled “Status Updates”The controller updates status with:
phase- Current lifecycle phasereplicas- Desired and ready countsconditions- Detailed state information
Watched Resources
Section titled “Watched Resources”The AgentRuntime controller watches:
- AgentRuntime resources (primary)
- PromptPack resources (to detect changes)
- ToolRegistry resources (to detect changes)
PromptPack Controller
Section titled “PromptPack Controller”Reconciliation Flow
Section titled “Reconciliation Flow”flowchart TD A[PromptPack Change Detected] --> B[Fetch ConfigMap] B -->|Found| C{Validate Content} B -->|Not Found| F[Set Failed Status] C -->|Valid| D[Find Referencing AgentRuntimes] C -->|Invalid| F D --> E{Rollout Strategy?} E -->|Immediate| G[Update activeVersion] E -->|Canary| H[Update canaryVersion] G --> I[Notify Agents] H --> J{Weight = 100%?} J -->|Yes| K[Promote to Active] J -->|No| I K --> I I --> L[Update Status: Active]Rollout Strategies
Section titled “Rollout Strategies”Immediate
Section titled “Immediate”Changes apply immediately:
- Validate new content
- Update
activeVersion - Agents pick up changes on next request
Canary
Section titled “Canary”Gradual rollout:
- Validate new content
- Set
canaryVersion - Route percentage of traffic to canary
- Monitor for issues
- Promote when weight reaches 100%
Watched Resources
Section titled “Watched Resources”The PromptPack controller watches:
- PromptPack resources (primary)
- ConfigMaps (to detect content changes)
ToolRegistry Controller
Section titled “ToolRegistry Controller”Reconciliation Flow
Section titled “Reconciliation Flow”flowchart TD A[ToolRegistry Change Detected] --> B[Process Handlers] B --> C{Handler Type} C -->|HTTP/gRPC| D[Use Explicit Schema] C -->|MCP/OpenAPI| E[Discover Tools from Service] D --> F[Find Matching Services] E --> F F --> G{Services Available?} G -->|All| H[Status: Ready] G -->|Some| I[Status: Degraded] G -->|None| J[Status: Failed] H --> K[Update discoveredTools] I --> K J --> KTool Discovery
Section titled “Tool Discovery”For selector-based tools:
- Find Services matching labels
- Parse tool metadata from annotations
- Determine endpoint URL from Service
- Check Service has ready endpoints
Status Phases
Section titled “Status Phases”| Phase | Condition |
|---|---|
Ready | All tools available |
Degraded | Some tools unavailable |
Failed | No tools available |
Watched Resources
Section titled “Watched Resources”The ToolRegistry controller watches:
- ToolRegistry resources (primary)
- Services (to detect tool availability changes)
Error Handling
Section titled “Error Handling”Transient Errors
Section titled “Transient Errors”On transient errors (network issues, API rate limits):
- Log the error
- Set status condition to reflect issue
- Requeue with exponential backoff
- Retry reconciliation
Permanent Errors
Section titled “Permanent Errors”On permanent errors (invalid spec, missing resources):
- Log the error
- Set phase to
Failed - Set condition with error message
- Do not requeue (wait for spec change)
Concurrency
Section titled “Concurrency”Controllers process resources concurrently:
- Default: 1 worker per controller
- Configurable via operator flags
- Safe: Each resource reconciled by one worker at a time
Requeuing
Section titled “Requeuing”Controllers requeue resources to:
- Retry after transient failures
- Check status of dependent resources
- Implement polling for external state
Requeue intervals:
| Scenario | Interval |
|---|---|
| Success | Not requeued |
| Transient error | 5s - 5m (exponential) |
| Waiting for dependency | 30s |