Architecture Overview

This document explains the architecture of Omnia and the design decisions behind it.

High-Level Architecture

Omnia consists of three main components:

graph TB
    subgraph cluster["Kubernetes Cluster"]
        subgraph operator["Omnia Operator"]
            op[Controller Manager]
        end

        subgraph pod["Agent Pod"]
            facade[Facade Container]
            runtime[Runtime Container]
            facade <-->|gRPC| runtime
        end

        op -->|creates| pod
        op -->|watches| pp[PromptPack ConfigMap]

        subgraph storage["Storage Layer"]
            session[(Session Store<br/>Redis)]
            tools[Tool Services]
        end

        facade --> session
        runtime --> tools
    end

    clients((Clients)) -->|WebSocket| facade

Components

Omnia Operator

The operator is a Kubernetes controller that:

Watches for AgentRuntime, PromptPack, ToolRegistry, and Provider resources
Creates and manages Deployments for agent pods
Generates ConfigMaps for tools configuration
Creates Services for agent access
Monitors referenced resources and updates agents accordingly

The operator follows the standard Kubernetes controller pattern:

Watch - Monitor custom resources for changes
Reconcile - Bring actual state to desired state
Status - Report current state back to the resource

Agent Pod (Sidecar Architecture)

Each agent pod runs two containers in a sidecar pattern:

Facade Container

The facade container handles external client communication:

WebSocket Server - Manages client connections and message routing
Session Management - Creates and tracks conversation sessions
Protocol Translation - Converts WebSocket messages to gRPC calls
Connection Lifecycle - Handles connect, disconnect, and heartbeat

Runtime Container

The runtime container handles LLM interactions and tool execution:

PromptKit Integration - Uses PromptKit SDK for LLM communication
Tool Manager - Loads and manages tool adapters (HTTP, gRPC, MCP, OpenAPI)
State Persistence - Saves conversation state to the session store
Tracing - OpenTelemetry instrumentation for observability

The containers communicate via gRPC on localhost, providing clean separation between client-facing logic and LLM processing.

Custom Resource Definitions

AgentRuntime

The primary resource for deploying agents. It references:

Provider configuration (which LLM to use)
PromptPack (what prompts to use)
ToolRegistry (what tools are available)
Session configuration
Runtime resources and scaling

PromptPack

Defines versioned prompt configurations following the PromptPack specification. Supports:

Structured prompt definitions with variables, parameters, and validators
ConfigMap-based storage of compiled PromptPack JSON
Canary rollouts for safe prompt updates
Automatic agent notification on changes

ToolRegistry

Defines tool handlers available to agents:

HTTP handlers - REST endpoints with explicit schemas
gRPC handlers - gRPC services using the Tool protocol
MCP handlers - Self-describing Model Context Protocol servers
OpenAPI handlers - Self-describing services with OpenAPI specs
Service discovery via label selectors

Provider

Configures LLM provider settings:

Provider type (claude, openai, gemini, etc.)
Model selection
API credentials
Custom base URLs

Tool Execution Flow

sequenceDiagram
    participant C as Client
    participant F as Facade
    participant R as Runtime
    participant TM as Tool Manager
    participant T as Tool Service

    C->>F: WebSocket message
    F->>R: gRPC request
    R->>R: Send to LLM
    R-->>R: LLM returns tool_call
    R->>TM: Execute tool
    TM->>T: Route to adapter (HTTP/gRPC/MCP/OpenAPI)
    T-->>TM: Tool result
    TM-->>R: Return result
    R->>R: Send result to LLM
    R-->>F: Stream response
    F-->>C: WebSocket chunks

The Tool Manager routes calls to the appropriate adapter based on handler type:

graph LR
    TM[Tool Manager] --> HTTP[HTTP Adapter]
    TM --> GRPC[gRPC Adapter]
    TM --> MCP[MCP Adapter]
    TM --> OA[OpenAPI Adapter]

    HTTP --> HS[REST Service]
    GRPC --> GS[gRPC Service]
    MCP --> MS[MCP Server]
    OA --> OS[OpenAPI Service]

Client sends message via WebSocket
Facade creates/resumes session and forwards to Runtime
Runtime sends message to LLM via PromptKit
LLM returns tool call request
Tool Manager routes call to appropriate adapter
Adapter executes tool and returns result
Result sent back to LLM for final response
Response streamed back through Facade to client

Observability

Omnia provides comprehensive observability through OpenTelemetry:

Tracing

The runtime container creates spans for:

Conversation turns - End-to-end request processing
LLM calls - Time spent in provider API calls
Tool executions - Individual tool call latency

Traces include:

Session ID for correlation
Token usage (input/output)
Cost information
Tool results (success/error)

Metrics

The operator and agent containers expose Prometheus metrics:

Request latency histograms
Tool call counts and durations
Session counts
LLM token usage

Configuration

Enable tracing via environment variables:

env:
  - name: OMNIA_TRACING_ENABLED
    value: "true"
  - name: OMNIA_TRACING_ENDPOINT
    value: "otel-collector.observability:4317"
  - name: OMNIA_TRACING_SAMPLE_RATE
    value: "1.0"

Design Decisions

Why Kubernetes Operator?

We chose the operator pattern because:

Native integration - Agents are first-class Kubernetes citizens
Declarative configuration - Define desired state, not procedures
Self-healing - Automatic recovery from failures
Scalability - Leverage Kubernetes scaling mechanisms

Why Sidecar Architecture?

Separating facade and runtime enables:

Separation of concerns - Client handling vs LLM processing
Independent scaling - Different resource requirements
Protocol flexibility - Easy to add new client protocols
Testability - Components can be tested in isolation
Language flexibility - Containers can use different languages

Why WebSocket?

WebSocket was chosen for the client facade because:

Streaming - Essential for LLM response streaming
Bidirectional - Enables tool calls and results
Persistent - Maintains connection for multi-turn conversations
Efficient - Lower overhead than HTTP polling

Why Separate PromptPack?

Separating prompts from agents allows:

Reusability - Same prompts across multiple agents
Versioning - Track prompt changes independently
Safe rollouts - Canary deployments for prompts
Separation of concerns - Prompt engineers vs DevOps

Why Handler-Based Tools?

The handler abstraction enables:

Self-describing services - MCP and OpenAPI discover tools automatically
Explicit schemas - HTTP and gRPC tools define their interface
Unified management - All tool types in one registry
Dynamic updates - Add/remove tools without redeploying agents

Resource Relationships

graph LR
    AR[AgentRuntime] -->|references| PP[PromptPack]
    AR -->|references| TR[ToolRegistry]
    AR -->|references| PR[Provider]
    AR -->|creates| D[Deployment]
    AR -->|creates| S[Service]

    PP -->|source| CM1[ConfigMap]
    TR -->|discovers| SVC[Services]
    TR -->|generates| CM2[Tools ConfigMap]
    PR -->|credentials| SEC[Secret]

    D -->|contains| FC[Facade Container]
    D -->|contains| RC[Runtime Container]

Reconciliation Flow

When an AgentRuntime is created or updated:

Validate the referenced PromptPack exists
Optionally validate the referenced ToolRegistry
Fetch Provider configuration
Generate tools ConfigMap from ToolRegistry
Build the pod spec with facade and runtime containers
Create or update the Deployment
Create or update the Service
Update the AgentRuntime status

When a ToolRegistry changes:

Process handlers (HTTP, gRPC, MCP, OpenAPI)
Discover tools from self-describing handlers
Update discovered tools in status
Find all AgentRuntimes referencing this ToolRegistry
Regenerate tools ConfigMaps for affected agents

Security Considerations

Secrets Management

API keys are stored in Kubernetes Secrets
Secrets are mounted as environment variables, not files
Secrets can be from the same or different namespace

Network Policies

Consider implementing NetworkPolicies to:

Restrict agent egress to allowed LLM providers
Limit tool access to specific services
Isolate agent namespaces

RBAC

The operator requires specific permissions:

Full access to Omnia CRDs
Read access to ConfigMaps and Secrets
Create/Update access to Deployments and Services