Controller Reconciliation
This document explains the reconciliation logic used by Omnia’s Kubernetes controllers.
Reconciliation Pattern
Omnia follows the Kubernetes controller reconciliation pattern:
- Watch - Monitor resources for changes
- Queue - Add changed resources to work queue
- Reconcile - Process each resource to achieve desired state
- Requeue - Retry on transient failures
AgentRuntime Controller
Reconciliation Flow
AgentRuntime Change Detected
│
▼
┌─────────────────┐
│ Validate Spec │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Fetch PromptPack│
└────────┬────────┘
│
▼
┌─────────────────┐
│ Fetch ToolReg │──(optional)
└────────┬────────┘
│
▼
┌─────────────────┐
│ Build Deployment│
└────────┬────────┘
│
▼
┌─────────────────┐
│ Create/Update │
│ Deployment │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Create/Update │
│ Service │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Update Status │
└─────────────────┘
Deployment Building
The controller builds the Deployment spec with:
- Container image - From operator configuration
- Environment variables:
OMNIA_AGENT_NAME- AgentRuntime nameOMNIA_NAMESPACE- NamespaceOMNIA_PROVIDER_*- Provider configurationOMNIA_SESSION_*- Session configuration
- Volume mounts - PromptPack ConfigMap
- Resource limits - From spec
- Labels - For identification and selection
Status Updates
The controller updates status with:
phase- Current lifecycle phasereplicas- Desired and ready countsconditions- Detailed state information
Watched Resources
The AgentRuntime controller watches:
- AgentRuntime resources (primary)
- PromptPack resources (to detect changes)
- ToolRegistry resources (to detect changes)
PromptPack Controller
Reconciliation Flow
PromptPack Change Detected
│
▼
┌─────────────────┐
│ Fetch ConfigMap │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Validate Content│
└────────┬────────┘
│
▼
┌─────────────────┐
│ Find Referencing│
│ AgentRuntimes │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Update Rollout │
│ Status │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Notify Agents │
└─────────────────┘
Rollout Strategies
Immediate
Changes apply immediately:
- Validate new content
- Update
activeVersion - Agents pick up changes on next request
Canary
Gradual rollout:
- Validate new content
- Set
canaryVersion - Route percentage of traffic to canary
- Monitor for issues
- Promote when weight reaches 100%
Watched Resources
The PromptPack controller watches:
- PromptPack resources (primary)
- ConfigMaps (to detect content changes)
ToolRegistry Controller
Reconciliation Flow
ToolRegistry Change Detected
│
▼
┌─────────────────┐
│ Process Inline │
│ Tools │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Discover Tools │
│ via Selectors │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Check Service │
│ Availability │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Update Status │
└─────────────────┘
Tool Discovery
For selector-based tools:
- Find Services matching labels
- Parse tool metadata from annotations
- Determine endpoint URL from Service
- Check Service has ready endpoints
Status Phases
| Phase | Condition |
|---|---|
Ready | All tools available |
Degraded | Some tools unavailable |
Failed | No tools available |
Watched Resources
The ToolRegistry controller watches:
- ToolRegistry resources (primary)
- Services (to detect tool availability changes)
Error Handling
Transient Errors
On transient errors (network issues, API rate limits):
- Log the error
- Set status condition to reflect issue
- Requeue with exponential backoff
- Retry reconciliation
Permanent Errors
On permanent errors (invalid spec, missing resources):
- Log the error
- Set phase to
Failed - Set condition with error message
- Do not requeue (wait for spec change)
Concurrency
Controllers process resources concurrently:
- Default: 1 worker per controller
- Configurable via operator flags
- Safe: Each resource reconciled by one worker at a time
Requeuing
Controllers requeue resources to:
- Retry after transient failures
- Check status of dependent resources
- Implement polling for external state
Requeue intervals:
| Scenario | Interval |
|---|---|
| Success | Not requeued |
| Transient error | 5s - 5m (exponential) |
| Waiting for dependency | 30s |