Multi-Tenancy Architecture
This document explains how Omnia provides multi-tenancy through Workspaces, the design decisions behind the architecture, and how it scales from small teams to enterprise deployments.
Overview
Section titled “Overview”Workspaces provide logical isolation for teams sharing an Omnia cluster. Each workspace has:
- Dedicated namespace - Kubernetes namespace for resource isolation
- Role-based access - Three roles (owner, editor, viewer) with scoped permissions
- Resource quotas - Limits on compute, objects, and Omnia-specific resources
- Network isolation - Automatic NetworkPolicy generation to restrict cross-tenant traffic
- Cost attribution - Tags for tracking spend by team
graph TB subgraph cluster["Kubernetes Cluster"] subgraph ws1["Workspace: Team A"] ns1[Namespace: omnia-team-a] sa1[ServiceAccounts] rb1[RoleBindings] q1[ResourceQuota] np1[NetworkPolicy] a1[Agents] end
subgraph ws2["Workspace: Team B"] ns2[Namespace: omnia-team-b] sa2[ServiceAccounts] rb2[RoleBindings] q2[ResourceQuota] np2[NetworkPolicy] a2[Agents] end
subgraph shared["Shared Resources"] providers[Providers] tools[ToolRegistries] end
a1 --> providers a1 --> tools a2 --> providers a2 --> tools end
users1((Team A Users)) --> ws1 users2((Team B Users)) --> ws2Isolation Model
Section titled “Isolation Model”Namespace Isolation
Section titled “Namespace Isolation”Each workspace maps to exactly one Kubernetes namespace. This provides:
- Resource scoping - Agents, PromptPacks, and Arena jobs are namespace-scoped
- Network isolation - NetworkPolicies can restrict cross-namespace traffic
- RBAC boundaries - Permissions are scoped to the namespace
The controller creates the namespace when spec.namespace.create: true:
spec: namespace: name: omnia-customer-support create: true labels: environment: productionRole-Based Access Control
Section titled “Role-Based Access Control”Workspaces define three roles with increasing permissions:
| Role | View Resources | Create/Edit | Delete | Manage Members |
|---|---|---|---|---|
| viewer | Yes | No | No | No |
| editor | Yes | Yes | Yes | No |
| owner | Yes | Yes | Yes | Yes |
The controller creates a ServiceAccount and RoleBinding for each role:
graph LR subgraph workspace["Workspace Controller"] ws[Workspace CRD] end
subgraph namespace["Workspace Namespace"] sa1[SA: ws-owner] sa2[SA: ws-editor] sa3[SA: ws-viewer] rb1[RoleBinding → owner] rb2[RoleBinding → editor] rb3[RoleBinding → viewer] end
subgraph clusterRoles["Cluster Roles"] cr1[omnia-workspace-owner] cr2[omnia-workspace-editor] cr3[omnia-workspace-viewer] end
ws --> sa1 ws --> sa2 ws --> sa3 sa1 --> rb1 sa2 --> rb2 sa3 --> rb3 rb1 --> cr1 rb2 --> cr2 rb3 --> cr3Authorization Flow
Section titled “Authorization Flow”Dashboard Authorization
Section titled “Dashboard Authorization”When a user accesses the dashboard, authorization happens at the application layer:
sequenceDiagram participant U as User participant D as Dashboard participant K as Kubernetes API
U->>D: Request workspace resources D->>D: Extract user identity (OIDC claims) D->>K: Get Workspace CRD K-->>D: Workspace spec D->>D: Match user groups to roleBindings D->>D: Determine role (owner/editor/viewer) D->>K: Get SA token for role K-->>D: Short-lived token D->>K: API call with SA token K-->>D: Resources D-->>U: Scoped response- Identity extraction - User’s email and groups from OIDC token
- Role determination - Match groups against
spec.roleBindings - Token acquisition - Fetch ServiceAccount token for the determined role
- Scoped API calls - Use token to make workspace-scoped K8s API calls
Why Application-Layer Authorization?
Section titled “Why Application-Layer Authorization?”Kubernetes RBAC has scaling limitations:
| Scale | Users | RoleBindings | Problem |
|---|---|---|---|
| Small | <50 | <100 | Works well |
| Medium | 50-500 | 500-5000 | etcd pressure, slow reconciliation |
| Large | 500+ | 5000+ | Unmanageable, audit nightmare |
Application-layer authorization keeps the Workspace CRD small (10-20 group entries) while supporting thousands of users. User management happens in your identity provider, not Kubernetes.
Group-Based Access
Section titled “Group-Based Access”Groups are the primary access mechanism:
spec: roleBindings: - groups: - "omnia-admins@acme.com" # Azure AD group - "engineering-team" # Okta group role: editorWhen a user’s JWT contains matching groups, they get the associated role. Multiple groups can map to the same role, and users get the highest privilege role from any matching group.
ServiceAccount Access
Section titled “ServiceAccount Access”For CI/CD and automation, ServiceAccounts use native Kubernetes RBAC:
spec: roleBindings: - serviceAccounts: - name: argocd-application-controller namespace: argocd role: editorThe controller creates actual RoleBindings for ServiceAccounts, allowing direct K8s API access without going through the dashboard.
Token Management
Section titled “Token Management”Short-Lived Tokens
Section titled “Short-Lived Tokens”The dashboard uses the Kubernetes TokenRequest API to get short-lived ServiceAccount tokens:
sequenceDiagram participant D as Dashboard participant K as Kubernetes API
D->>K: TokenRequest (SA, 1h expiry) K-->>D: JWT token D->>D: Cache token (50 min) D->>K: API calls with token Note over D,K: Token auto-refreshes before expiryBenefits:
- Security - Tokens expire automatically
- Least privilege - Token matches user’s role
- Audit - Token identity appears in K8s audit logs
Token Caching
Section titled “Token Caching”Tokens are cached to avoid excessive TokenRequest calls:
- TTL: Tokens have 1-hour expiry
- Cache duration: 50 minutes (refresh before expiry)
- Cache key: workspace + role combination
Shared Resources
Section titled “Shared Resources”Some resources are shared across workspaces:
Providers
Section titled “Providers”LLM providers are typically cluster-wide:
apiVersion: omnia.altairalabs.ai/v1alpha1kind: Providermetadata: name: claude-sonnet namespace: omnia-shared labels: omnia.altairalabs.ai/shared: "true"Agents reference shared providers:
spec: providerRef: name: claude-sonnet namespace: omnia-sharedToolRegistries
Section titled “ToolRegistries”Common tools can be shared:
apiVersion: omnia.altairalabs.ai/v1alpha1kind: ToolRegistrymetadata: name: common-tools namespace: omnia-shared labels: omnia.altairalabs.ai/shared: "true"Access to shared resources requires explicit reference. Workspaces can’t enumerate or modify resources in other namespaces.
Resource Quotas
Section titled “Resource Quotas”Kubernetes ResourceQuota
Section titled “Kubernetes ResourceQuota”The controller can create ResourceQuotas in workspace namespaces:
spec: quotas: compute: requests.cpu: "50" requests.memory: "100Gi"This maps directly to Kubernetes ResourceQuota objects.
Omnia-Specific Quotas
Section titled “Omnia-Specific Quotas”Additional quotas control Omnia resources:
spec: quotas: agents: maxAgentRuntimes: 20 maxReplicasPerAgent: 10 arena: maxConcurrentJobs: 10 maxJobsPerDay: 100These are enforced by admission webhooks (planned) or controller validation.
Design Decisions
Section titled “Design Decisions”Why 1:1 Workspace:Namespace?
Section titled “Why 1:1 Workspace:Namespace?”Simplicity and Kubernetes alignment:
- Native RBAC - Permissions naturally scope to namespaces
- Resource isolation - Standard Kubernetes isolation model
- Familiar model - Teams understand namespace boundaries
- Tool compatibility - Works with existing K8s tooling
Future versions may support multiple namespaces per workspace for complex organizational structures.
Why Not Kubeflow Profiles?
Section titled “Why Not Kubeflow Profiles?”Kubeflow Profiles require Istio and additional infrastructure. Workspaces are:
- Self-contained - No external dependencies
- Lightweight - Minimal resource overhead
- Focused - Built for Omnia’s specific needs
Why Not Capsule or HNC?
Section titled “Why Not Capsule or HNC?”Third-party multi-tenancy solutions add operational complexity:
- Capsule - Powerful but requires learning new abstractions
- HNC - Archived in April 2025
- Workspaces - Integrated into Omnia, purpose-built
Why Application-Layer Auth for Users?
Section titled “Why Application-Layer Auth for Users?”Kubernetes RBAC doesn’t scale for human users:
- RoleBinding explosion - N users × M workspaces = N×M bindings
- No dynamic groups - Changes require RoleBinding updates
- No directory sync - Manual user provisioning
Application-layer auth with IdP groups provides:
- Scalability - Thousands of users, minimal CRD entries
- Dynamic membership - Add users in IdP, instant access
- Centralized management - Single source of truth in IdP
Security Considerations
Section titled “Security Considerations”Principle of Least Privilege
Section titled “Principle of Least Privilege”- Users get the minimum role needed
- Anonymous access defaults to viewer
- ServiceAccount tokens are scoped to workspace
Token Security
Section titled “Token Security”- Short-lived tokens (1 hour default)
- Tokens cached server-side, not exposed to browser
- Automatic refresh before expiry
Audit Trail
Section titled “Audit Trail”- Kubernetes audit logs capture all API calls
- Token identity shows which workspace SA was used
- Dashboard can log user actions separately
Network Isolation
Section titled “Network Isolation”Workspaces can automatically generate NetworkPolicies to enforce network boundaries between tenants. When enabled, the controller creates a NetworkPolicy that restricts traffic while allowing essential communication.
spec: networkPolicy: isolate: trueGenerated NetworkPolicy
Section titled “Generated NetworkPolicy”The controller creates a NetworkPolicy named workspace-{name}-isolation with these default rules:
graph TB subgraph workspace["Workspace Namespace"] pods[All Pods] end
subgraph defaults["Default Allowed Traffic"] dns[DNS to kube-system] same[Same Namespace] shared[Shared Namespaces] external[External APIs] end
pods --> dns pods <--> same pods <--> shared pods --> external
style dns fill:#90EE90 style same fill:#90EE90 style shared fill:#90EE90 style external fill:#90EE90| Default Rule | Direction | Purpose |
|---|---|---|
| DNS (port 53) | Egress | Allow pods to resolve DNS |
| Same namespace | Both | Allow intra-workspace communication |
| Shared namespaces | Both | Allow access to shared Providers/Tools |
| External IPs (0.0.0.0/0) | Egress | Allow LLM API calls |
| Private IPs (10.x, 172.16.x, 192.168.x) | Blocked | Prevent cross-tenant access |
Customizing Network Rules
Section titled “Customizing Network Rules”For fine-grained control, add custom ingress and egress rules:
spec: networkPolicy: isolate: true allowExternalAPIs: true # Default: true allowSharedNamespaces: true # Default: true allowFrom: - peers: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: ingress-nginx allowTo: - peers: - ipBlock: cidr: 10.0.0.0/8 # Internal database network ports: - protocol: TCP port: 5432High-Security Mode
Section titled “High-Security Mode”For maximum isolation, disable external API access and explicitly allow only required endpoints:
spec: networkPolicy: isolate: true allowExternalAPIs: false allowTo: # Only allow specific LLM provider - peers: - ipBlock: cidr: 104.18.0.0/16 # Anthropic API ports: - protocol: TCP port: 443This approach provides defense-in-depth: even if application-layer authorization is compromised, network-layer controls prevent unauthorized data exfiltration or cross-tenant access.
Local Development Mode
Section titled “Local Development Mode”For local development where agents need to access services on private networks (e.g., local Ollama):
spec: networkPolicy: isolate: true allowPrivateNetworks: true # Removes RFC 1918 exclusionsThis is useful when running LLM providers locally or accessing internal development services. Do not use in production.
Scaling Considerations
Section titled “Scaling Considerations”Small Deployments (<50 users)
Section titled “Small Deployments (<50 users)”Use the default setup:
- Anonymous mode for development
- Basic OIDC for production
- Single workspace per team
Medium Deployments (50-500 users)
Section titled “Medium Deployments (50-500 users)”Enable full workspace features:
- OIDC with group claims
- Multiple workspaces by team
- Resource quotas per workspace
Large Deployments (500+ users)
Section titled “Large Deployments (500+ users)”Leverage the scalability design:
- IdP groups for all access (no direct grants)
- Automated workspace provisioning via GitOps
- Monitoring and alerting on quota usage
Enterprise (5000+ users)
Section titled “Enterprise (5000+ users)”Consider additional patterns:
- External policy engine (OPA, Cedar) for complex rules
- Hierarchical workspaces (parent/child)
- Cross-workspace sharing policies
Next Steps
Section titled “Next Steps”- Manage Workspaces - Create and configure workspaces
- Configure Dashboard Authentication - Set up OIDC
- Workspace CRD Reference - Complete field reference