Architecture Overview
This document explains the architecture of Omnia and the design decisions behind it.
High-Level Architecture
Section titled “High-Level Architecture”Omnia consists of three main components:
graph TB subgraph cluster["Kubernetes Cluster"] subgraph operator["Omnia Operator"] op[Controller Manager] end
subgraph pod["Agent Pod"] facade[Facade Container] runtime[Runtime Container] facade <-->|gRPC| runtime end
op -->|creates| pod op -->|watches| pp[PromptPack ConfigMap]
subgraph storage["Storage Layer"] session[(Session Store<br/>Redis)] tools[Tool Services] end
facade --> session runtime --> tools end
clients((Clients)) -->|WebSocket| facadeComponents
Section titled “Components”Omnia Operator
Section titled “Omnia Operator”The operator is a Kubernetes controller that:
- Watches for AgentRuntime, PromptPack, ToolRegistry, and Provider resources
- Creates and manages Deployments for agent pods
- Generates ConfigMaps for tools configuration
- Creates Services for agent access
- Monitors referenced resources and updates agents accordingly
The operator follows the standard Kubernetes controller pattern:
- Watch - Monitor custom resources for changes
- Reconcile - Bring actual state to desired state
- Status - Report current state back to the resource
Agent Pod (Sidecar Architecture)
Section titled “Agent Pod (Sidecar Architecture)”Each agent pod runs two containers in a sidecar pattern:
Facade Container
Section titled “Facade Container”The facade container handles external client communication:
- WebSocket Server - Manages client connections and message routing
- Session Management - Creates and tracks conversation sessions
- Protocol Translation - Converts WebSocket messages to gRPC calls
- Connection Lifecycle - Handles connect, disconnect, and heartbeat
Runtime Container
Section titled “Runtime Container”The runtime container handles LLM interactions and tool execution:
- PromptKit Integration - Uses PromptKit SDK for LLM communication
- Tool Manager - Loads and manages tool adapters (HTTP, gRPC, MCP, OpenAPI)
- State Persistence - Saves conversation state to the session store
- Tracing - OpenTelemetry instrumentation for observability
The containers communicate via gRPC on localhost, providing clean separation between client-facing logic and LLM processing.
Custom Resource Definitions
Section titled “Custom Resource Definitions”AgentRuntime
Section titled “AgentRuntime”The primary resource for deploying agents. It references:
- Provider configuration (which LLM to use)
- PromptPack (what prompts to use)
- ToolRegistry (what tools are available)
- Session configuration
- Runtime resources and scaling
PromptPack
Section titled “PromptPack”Defines versioned prompt configurations following the PromptPack specification. Supports:
- Structured prompt definitions with variables, parameters, and validators
- ConfigMap-based storage of compiled PromptPack JSON
- Canary rollouts for safe prompt updates
- Automatic agent notification on changes
ToolRegistry
Section titled “ToolRegistry”Defines tool handlers available to agents:
- HTTP handlers - REST endpoints with explicit schemas
- gRPC handlers - gRPC services using the Tool protocol
- MCP handlers - Self-describing Model Context Protocol servers
- OpenAPI handlers - Self-describing services with OpenAPI specs
- Service discovery via label selectors
Provider
Section titled “Provider”Configures LLM provider settings:
- Provider type (claude, openai, gemini, etc.)
- Model selection
- API credentials
- Custom base URLs
Tool Execution Flow
Section titled “Tool Execution Flow”sequenceDiagram participant C as Client participant F as Facade participant R as Runtime participant TM as Tool Manager participant T as Tool Service
C->>F: WebSocket message F->>R: gRPC request R->>R: Send to LLM R-->>R: LLM returns tool_call R->>TM: Execute tool TM->>T: Route to adapter (HTTP/gRPC/MCP/OpenAPI) T-->>TM: Tool result TM-->>R: Return result R->>R: Send result to LLM R-->>F: Stream response F-->>C: WebSocket chunksThe Tool Manager routes calls to the appropriate adapter based on handler type:
graph LR TM[Tool Manager] --> HTTP[HTTP Adapter] TM --> GRPC[gRPC Adapter] TM --> MCP[MCP Adapter] TM --> OA[OpenAPI Adapter]
HTTP --> HS[REST Service] GRPC --> GS[gRPC Service] MCP --> MS[MCP Server] OA --> OS[OpenAPI Service]- Client sends message via WebSocket
- Facade creates/resumes session and forwards to Runtime
- Runtime sends message to LLM via PromptKit
- LLM returns tool call request
- Tool Manager routes call to appropriate adapter
- Adapter executes tool and returns result
- Result sent back to LLM for final response
- Response streamed back through Facade to client
Observability
Section titled “Observability”Omnia provides comprehensive observability through OpenTelemetry:
Tracing
Section titled “Tracing”The runtime container creates spans for:
- Conversation turns - End-to-end request processing
- LLM calls - Time spent in provider API calls
- Tool executions - Individual tool call latency
Traces include:
- Session ID for correlation
- Token usage (input/output)
- Cost information
- Tool results (success/error)
Metrics
Section titled “Metrics”The operator and agent containers expose Prometheus metrics:
- Request latency histograms
- Tool call counts and durations
- Session counts
- LLM token usage
Configuration
Section titled “Configuration”Enable tracing via environment variables:
env: - name: OMNIA_TRACING_ENABLED value: "true" - name: OMNIA_TRACING_ENDPOINT value: "otel-collector.observability:4317" - name: OMNIA_TRACING_SAMPLE_RATE value: "1.0"Design Decisions
Section titled “Design Decisions”Why Kubernetes Operator?
Section titled “Why Kubernetes Operator?”We chose the operator pattern because:
- Native integration - Agents are first-class Kubernetes citizens
- Declarative configuration - Define desired state, not procedures
- Self-healing - Automatic recovery from failures
- Scalability - Leverage Kubernetes scaling mechanisms
Why Sidecar Architecture?
Section titled “Why Sidecar Architecture?”Separating facade and runtime enables:
- Separation of concerns - Client handling vs LLM processing
- Independent scaling - Different resource requirements
- Protocol flexibility - Easy to add new client protocols
- Testability - Components can be tested in isolation
- Language flexibility - Containers can use different languages
Why WebSocket?
Section titled “Why WebSocket?”WebSocket was chosen for the client facade because:
- Streaming - Essential for LLM response streaming
- Bidirectional - Enables tool calls and results
- Persistent - Maintains connection for multi-turn conversations
- Efficient - Lower overhead than HTTP polling
Why Separate PromptPack?
Section titled “Why Separate PromptPack?”Separating prompts from agents allows:
- Reusability - Same prompts across multiple agents
- Versioning - Track prompt changes independently
- Safe rollouts - Canary deployments for prompts
- Separation of concerns - Prompt engineers vs DevOps
Why Handler-Based Tools?
Section titled “Why Handler-Based Tools?”The handler abstraction enables:
- Self-describing services - MCP and OpenAPI discover tools automatically
- Explicit schemas - HTTP and gRPC tools define their interface
- Unified management - All tool types in one registry
- Dynamic updates - Add/remove tools without redeploying agents
Resource Relationships
Section titled “Resource Relationships”graph LR AR[AgentRuntime] -->|references| PP[PromptPack] AR -->|references| TR[ToolRegistry] AR -->|references| PR[Provider] AR -->|creates| D[Deployment] AR -->|creates| S[Service]
PP -->|source| CM1[ConfigMap] TR -->|discovers| SVC[Services] TR -->|generates| CM2[Tools ConfigMap] PR -->|credentials| SEC[Secret]
D -->|contains| FC[Facade Container] D -->|contains| RC[Runtime Container]Reconciliation Flow
Section titled “Reconciliation Flow”When an AgentRuntime is created or updated:
- Validate the referenced PromptPack exists
- Optionally validate the referenced ToolRegistry
- Fetch Provider configuration
- Generate tools ConfigMap from ToolRegistry
- Build the pod spec with facade and runtime containers
- Create or update the Deployment
- Create or update the Service
- Update the AgentRuntime status
When a ToolRegistry changes:
- Process handlers (HTTP, gRPC, MCP, OpenAPI)
- Discover tools from self-describing handlers
- Update discovered tools in status
- Find all AgentRuntimes referencing this ToolRegistry
- Regenerate tools ConfigMaps for affected agents
Security Considerations
Section titled “Security Considerations”Secrets Management
Section titled “Secrets Management”- API keys are stored in Kubernetes Secrets
- Secrets are mounted as environment variables, not files
- Secrets can be from the same or different namespace
Network Policies
Section titled “Network Policies”Consider implementing NetworkPolicies to:
- Restrict agent egress to allowed LLM providers
- Limit tool access to specific services
- Isolate agent namespaces
The operator requires specific permissions:
- Full access to Omnia CRDs
- Read access to ConfigMaps and Secrets
- Create/Update access to Deployments and Services